Have you ever tried to dig into the machine learning world and enhance your application with some artificial intelligence? The more you dive into this topic, the more complex math symbols and statistical formulas you will encounter. Fortunately, big cloud providers are investing in this area as well, making AI services easy to use even for people without a Ph.D. in data science. One of the products is Cognitive Services offered by Microsoft Azure. This set of APIs and tools allows for implementing smart models for text, image or sound analysis. In this article, I will show how fast and easy can be creating a custom image classifier. For the best learning experience we need a good real-life example, right?
You look like a typical…
Has anyone ever guessed your job before you told them? Would you be able to picture in your head a typical IT guy? Or a typical nurse? How these stereotypes are related to the actual situation in the job market?
Classification models have already helped us with identifying age, gender or emotions from a person’s photo. There were even more controversial experiments, like predicting the ability to commit a crime. Would it be possible to label a person with a job title just by analyzing a business photo? With the Custom Vision from Cognitive Services, we can quickly train a classification model. Like always let’s start with collecting the data.
Profiles and profiles
To run this experiment I downloaded over 60k publicly available profiles, mostly from the IT and healthcare sector. Each profile contains a name, a job title, and a photo. Cleaning the data and unifying job titles was a surprisingly time-consuming process, as apparently people like long and fancy job titles on social media. I have used Power BI to clean and transform the data stored as JSON objects in a text file.
To get more insights about the people on the photos I used Face API from Cognitive Services. Firstly it can detect faces on the photo, so I can remove empty pictures or pictures with multiple people. Secondly, it gives many attributes extracted from face recognition, like age, gender, hairstyle, etc. It will be used to create training datasets for further models and validate the correctness of the analysis. At this point, we’re ready to create a first model.
Simple questions first
Building an image classifier may not require writing a single line of code with the Custom Vision portal that wraps Cognitive Services API into a handy web interface. Let’s create a first classification project with a single tag per image. The classifier can be optimized for a specific domain, but from the available options (general, food, landmarks, retail) the general domain seems to be the only right choice. Does it mean that the model will be less accurate with categorizing profile photos? Will it catch a small detail to differentiate pictures? The first task will be relatively easy, just to measure the performance of the model.
On the first try – 1000 pictures of men and women.
These measures are high, but how to interpret them? Here comes the documentation:
Precision indicates the fraction of identified classifications that were correct. For example, if the model identified 100 images as dogs, and 99 of them were actually of dogs, then the precision would be 99%.
Recall indicates the fraction of actual classifications that were correctly identified. For example, if there were actually 100 images of apples, and the model identified 80 as apples, the recall would be 80%.
So far the results are really good, but that was an easy one. The next dataset is two categories with a small difference. With the Face API, I could detect whether the person has glasses or not, so I created two subsets with 1000 pictures in each. Let’s see whether this generic classifier will be able to catch that difference.
That looks promising! So far it is proven that the model can capture a small difference in the face attributes, so it can be used for further analysis. It is also extremely simple to train the classifier with the web interface, as it requires only uploading the photos, specifying categories and then does all of the work under the hood. On the other hand, there are almost no options to tune the model parameters, but this process is clearly designed for simplicity and works pretty well. Time for some more advanced testing.
IT or not IT?
For the first real test, I created three subsets of women working as software developers, nurses and recruiters. Now I’m going to pair up these groups and check the precision of the model.
Not great, not terrible. The best performing model was between software developers and nurses. It correctly identified almost 70% of the pictures on a 50% threshold. By increasing or decreasing the threshold we can gain higher precision or higher recall, but the AP parameter is giving us the average performance of our model with different thresholds. Usually higher precision means lower recall and opposite.
What is the key difference between these two datasets? Is it the hair color or maybe glasses? To find out let’s use the output from Face API and compare face attributes in Power BI.
Face to face meeting
The easiest way for me was to export the whole dataset to a text file and import it in Power BI Desktop. This free application allows you to import data from many sources and work with various data types. In my case transforming nested JSON objects was not a big issue, as you can add a custom column with a transformation script using Power Query M formula language. After preparing the table columns it is just a matter of drag-and-dropping specific fields to one of the many visual controls. Going back to the previous question – is there a visual difference between profiles from the nurses and software developers groups?
As you can see, Power BI Desktop makes it easy to quickly visualize a dataset. Key differences found? Age and hair color. Let’s extract the men’s profiles and continue with the analysis.
Choose your fraction
Previously the datasets were grouped by only one job title. This time I would like to compare people from the whole job sector, so I am going to add also related jobs. This way in the IT group we are going to find software developers, IT consultants and database administrators. Healthcare will contain nurses, doctors and clinical research specialists. The last group will be Human Resources with recruiters, talent acquisition specialists and benefits specialists. All filtered by men only. Let’s see what do we have here.
IT vs HR has the most interesting result, mainly because the model was quite precise (72%) on identifying IT people, but managed to identify only 58% of the whole category. Generally, we can see much lower accuracy, so extending the groups negatively affected the final result.
The vision is clear
Is it actually possible to assign job titles just by analyzing business photos? I wouldn’t say that was the clue of this experiment. What I have learned is that Custom Vision is a really handy tool for image classification, even for non-technical people. Collecting and cleaning the data consumed around 80% of the time in this project, the rest was just pure fun with creating models and visualizing data in Power BI. Cloud providers are trying to make the machine learning world more approachable for non-scientists and the results are truly great.
But what about stereotypes? They are only in our heads, right? Let’s google for “typical nurse meme” and classify the most popular photos. You can interpret the results on your own.