Machine Learning
Testing the OpenAI CLIP Model for Food Type Recognition with Custom Data
In an effort to better understand recent advancements in computer vision, we use custom datasets to investigate three machine learning model architectures to compare their ability to learn and identify complex visual concepts. Specifically, using custom datasets of different food types, we test the performance of two models (zero-shot and linear probe) that leverage a neural network developed by OpenAI—using the Contrastive Language–Image Pre-training (CLIP) method—against a ResNet50 model. We find that the CLIP linear probe model delivers the most accurate results in each instance, by a notable margin, while the performance hierarchy of CLIP zero-shot and ResNet50 depend on the nature of the dataset and tuning.