Noodling About - Experiments, Writing & More

Fiddling with the Open AI Clip Neural Network

Having experimented quite a bit with LLMs, GenAI and "Retrieval-Augmented Generation" and found it all somewhat frustrating, I noodled around looking for other interesting technology to explore and came across OpenAI's Clip Neural Network.

So What Is CLIP?

OpenAI’s CLIP (Contrastive Language–Image Pre-training)

OpenAI CLIP (source code here) is a neural network that can understand both images and text in the same "conceptual semantic space". What that means - in more human terms - is that it has learned how images and their natural language descriptions relate. So instead of being trained on one narrow task (e.g. "detect cats"), having been trained on hundreds of millions of image–caption pairs it's able to handle far broader tasks, e.g. connect visual content with open-ended human concepts: “classify photos of healthy leaves versus photos of diseased leaves” - without having been specifically trained on healthy and unhealthy leaves.

Two Things I'm Particularly Interested In:

1: Labelling an images content: Based on the descriptions at https://openai.com/index/clip/ and https://github.com/openai/CLIP it seems that CLIP should be able to say, for example, whether a dog has black fur, or a type of a bicycle.

2: Comparing images: It turns out that a side-effect of CLIP is that it's actually very good at comparing the contents of images e.g. "are these two dogs similar"?

Let's Try It Out!

We have four test images:

Bike 1 and Bike 2:

A bicycle - bike 1)

A bicycle - bike 2)

Dog 1 and Dog 2:

A dog - dog 1)

A dog - dog 2)

We can use the "Sentence Transformer" Python library to work with the catchily titled 'clip-ViT-B-32' model ('the Image & Text model that maps text and images to a shared vector space') to compare and label images. FYI when comparing images, the CLIP model gives a score between 0 and 1, with 1 indicating that two images are identical. So, what do get?

Similarity Scores:

Labelling:

Based on this very small test it seems like the model is worth investigating further. Seems like it has potential.

Comparing Book and Album Covers:

This got me thinking about how else OpenAI’s CLIP could be applied. So, I downloaded around 100 album cover images from https://musicbrainz.org/ and Wikipedia's list of best selling music artists, and and around 300 book cover images from OpenLibrary and Wikipedia's list of best selling books & wrote some code to explore how CLIP could be used to browse collections of books and albums.

Could CLIP be used to browse these books and albums based purely on the images? Would this work and make sense? Might it throw up something interesting? Or would it be completely nonsensical?

So, I ended up writing something that allows browsing by:

Let's have a look at a few examples:

Books:

Images semantically similar to Michelle Obama's book cover (according to CLIP):

Images semantically similar to Michelle Obama's book cover I'd say CLIP has done fairly well here identifying similar books by or about Michelle and Barack Obama. I wonder whether the Mariah Carey book creeped in due to similarity between "Becoming" and "Meaning". The other two books .. er, not too sure about those. Maybe the structure was similar. 🤷

Images similar by colour to Michelle Obama's book cover:

Images similar by colour to Michelle Obama's book cover Certainly found images that are similar in colour - how useful that is, that's a different question 😜

Images similar by 'PHash' to Michelle Obama's book cover:

Images similar by 'PHash' to Michelle Obama's book cover OK, so "Perceptual Hash" is apparently matching "structure, tone, and general layout". Hmmm. 🤔

Albums:

Images semantically similar to the cover of Bob Marley's Greatest Hits (according to CLIP):

Images semantically similar to the cover of Bob Marley's Greatest Hits Again CLIP has done a pretty good job here pulling in a bunch of other Bob Marley albums. Quite how "Sgt Pepper" is related is a bit of a mystery though. 🤷

Images similar by colour to the cover of Bob Marley's Greatest Hits:

Images similar by colour to the cover of Bob Marley's Greatest Hits By colour hasn't worked quite as well as with Michelle Obama's book cover, seems to have got fixated on red rather alot.

Images similar by 'PHash' to the cover of Bob Marley's Greatest Hits:

Images similar by 'PHash' to the cover of Bob Marley's Greatest Hits "Perceptual Hash" is just rather surreal really. 🤪

My Conclusions:

Couple of Real-World Uses of CLIP:

If you're interested in reading a bit more about how others have used CLIP, here's a couple of examples:

What's Next?

I'm going to look into whether OpenAI’s CLIP could be useful in identifying stolen bikes. We have:

Could we download a bunch of images and compare them and maybe find stolen bikes for sale? All will be revealed in the next blog post ... 🥳

Previous Post | Next Post