Image Recognition Engine with GenAI Explanation

code
analysis
deploy
Author

Niharika Balachandra

Published

February 3, 2024

The Deployed Application

Introduction: Unpacking the Application

I wanted to develop an Image Recognition Engine that not only identifies images but also offers an initial explanation of the recognized image using an open-source, large language model. The ultimate goal was to create a self-contained application that goes beyond image recognition and also serves as a starting point to gain an understanding of what an identified image is all about.

This particular application shown above focuses on recognition of flowers and is based on the 102 Category Flower Dataset

There are 3 main components to the app and I explain each component in more detail in the following sections: - Image Recognition - Generating Image Explanation via LLM - Streamlit Application

Image Recognition

Model Tuning

The idea is to use transfer learning to fine tune a pre-trained vision model. I use the Resnet50 pre-trained model for this project but there are a host of other pre-trained vision models you can find via pytorch-image-models or timm. The easiest way to fine tune the Resnet50 pre-trained model is using the fastai library. Here is the notebook I used to generate a pickled version of the fine-tuned model.

Deploying on Hugging Face

We now needed to host this serialized model on Hugging Face and interact with it via the huggingface_hub python library. Uploading a model to Hugging Face is really simple via the UI and this doc walks though the process in detail. The following code can then be used to load this hosted model into memory so we can build the app around it.

Code
from huggingface_hub import hf_hub_download

REPO_ID = "username/huggingface_repo"
FILENAME = "export.pkl"

model_file_name = hf_hub_download(repo_id=REPO_ID, filename=FILENAME)

Architecture

---
title: Model Fine-tuning
---
flowchart LR
  A(Resnet50 Pre-trained Model) --> B(Fine-tuned for Identification of Flowers)
  B --> C(Serialized Model)

---
title: Deploying Model on Hugging Face 
---
flowchart LR
 C(Serialized Fine-tuned Model) --> D(Host Serialized Model on Hugging Face)
D --> E(Load Model as needed)

Code
from fastai.vision.all import *
from huggingface_hub import hf_hub_download

REPO_ID = "<hugging_face_repo_hosting_fine-tuned_model>"
FILENAME = "export.pkl"

model_file_name = hf_hub_download(repo_id=REPO_ID, filename=FILENAME)
learn_inf = load_learner(model_file_name)
labels = learn_inf.dls.vocab

def predict(img):
    img = PILImage.create(img)
    pred,pred_idx,probs = learn_inf.predict(img)
    dict_ = {labels[i]: float(probs[i]) for i in range(len(labels))}
    dict_ = dict(sorted(dict_.items(), key=lambda x: x[1], reverse=True))
    best_class = list(dict_.keys())[0]
    return dict_ , best_class

image = PILImage.create(file_name)
col1.image(image, use_column_width=True)
predictions, best_class = predict(image)

Generating Image Explanation via LLM

Deploying Quantized LLM on Hugging Face

The idea here is to host a dockerized version of a quantized open source LLM using the CPU tier on Hugging Face Spaces and exposing the LLM via an API endpoint for us to then be able to interact with for free. I chose to use Zephyr 7B Alpha which is a fine-tuned version of Mistral AI’s Mistral-7B-v0.1.

My implementation of this step is inspired by this blog post that goes over the exact process of getting to an API endpoint.

Architecture

---
title: Deploying Quantized LLM on Hugging Face
---
flowchart LR
C(FastAPI Function for LLM completion) --> D(Containerize our Application)
D --> E(Deploy via Hugging Face Spaces)
E --> F(Fast API Endpoint)

Code
import requests
import os

# hugging_face_token_with_read_access is stored as a hugging face secret
HF_TOKEN = os.environ.get('<hugging_face_token_with_read_access>')
best_class = '<identified_image_label>'
API_URL = "<API_endpoint_of_hugging-face_hosted_LLM-model>"

request_auth = f'Bearer {HF_TOKEN}' 
headers = {
    'accept': 'application/json',
    'Content-Type': 'application/json',
    'Authorization': request_auth
}
json_data = {
    'prompt': f'what is a {best_class} in 2 sentences?'
}

response = requests.post(
    API_URL,
    headers=headers,
    json=json_data,
)

try:
    respose_txt = response.json()
except Exception as e:
    print(f'Error details: {e}')

Streamlit Application

Deploying Streamlit Application on Hugging Face Spaces

I start with creating a new Streamlit Hugging Face Space. I then build the streamlit app that integrates all the 3 components: load in the image recognition model, build out the interface to upload images and display results and finally to pass the prediction via a prompt to generate a high-level explanation of the flower species identified. You can refer to the app.py file for more information on how everything was stitched together in an application.

Architecture

---
title: Streamlit Application on Hugging Space Spaces
---
flowchart LR

D(Build App User Interface)
D --> B(Upload Flower Image)
D --> C(Load Image Recognition Model)
B --> C
C --> E{Identify Flower Species}
E --> N(Output Identified Flower Species)

E --> F(Pass Identified Label to LLM)
F --> G(Generate LLM Response)
G --> O(Ouput Response on Identified Flower Species)
O --> D

Reference

Model tuning | Image Recognition

  • My approach was inspired by this kaggle notebook

Deploying Quantized LLM on Hugging Face | Generating Image Explanation via LLM

  • The blog post that goes over the exact process of getting to an API endpoint

Get Notified when Niharika Publishes!

  1. Subscribe here!
  2. Follow Niharika on Medium