Image Recognition With Open Source Technology
In this document, we will see whether a suitable dataset, workflow and existing scripting competence is sufficient for the development of an automatic image recognition process. The intention is tabulating common hazards, signs and issues seen on the field for non-profit case workers who manage chronically homeless.
Image Ai, Open Cv, Tensorflow
For about five years, the average developer has been able to make a reasonable image recognition model to predict or detect properties of images without an exorbitant amount of resources. Google and Facebook have opened the floodgates by introducing their open source (e.g. free!) resources that are trained on the data they have amassed on their users. The reason they liberate this data is two fold:
(1) Sharing the data allows Facebook, Google to crowdsource new solutions for their internal problems without having to pay for them directly.
(2) There is likely a tax benefit alongside a public relations one for them sharing a sliver of all the data they produce internally. This is purely speculation on my part, of course.
Automatically recognizing images depends on having a sufficient amount of training data. This training data consists of annotated images that contain sections of the image defined as objects.
Fortunately, In ImageAI, OpenCV, TensorFlow, etc. the average developer has the ability to easily train an image recognition model so long as the requisite amount of photos exist and that these are annotated.
Annotating Data: Basic Concepts
What does annotation mean? Each image will have a delimited zone that gets annotated with a label. Using many of these as training data, then the image recognition model recognizes data that is similar. The amount of data depends on how specific the label is with regards to the predicted label. For instance, it takes a smaller amount of data to identify a person than it does to identify between two specific people. Likewise, it’s easier to identify something that is a bird as opposed to a specific type of bird. The more specific the prediction then the more data is needed.
Example Of An Annotation:
The bird below is an image where the actual bird is identified using the labelbox.com tool where a zone of the image is highlighted. While you and I see a bird clearly as the most salient feature of the image, a computer must be ‘trained’ on many images with a specific configuration. Each zone is a set of pixels that over many images gets identified because of geometric properties that are far beyond my understanding. The important part is that free software is available to provide shortcuts for this type of image processing.
Model Output
Each photo or image inserted into an image recognition training model must have the area or feature that must be recognized ‘annotated’ or marked off as the relevant spot or object to recognize. In this below example, we can see that an image recognition model can distinguish very fine details. The area is delimited with an image label {‘hazelnut’, ‘fig’, ‘date’} is clearly seen ; the percentage score indicates how ‘certain’ the model is of the label.
Using Free Software For ‘The Swag’ Image Recognition
TBD is python code for training data based on “the SWAG” photos. 🙂
Using photos from public events, we will make an image recognition process to distinguish between Swag employees and non-SWAG law enforcement employees. More often than not full body information, like clothing lettering, is available and our image recognition device can rely on this data. For every label we want to identify, we need at least 32 images. For example, suppose we want to identify a firetruck, then we would create a directory with 32 different images of that fire truck in our shared repository. This is the type of workflow that will eventually result between
In python, we can review how viable any of the freely available models – one model was developed by Google. As a first attempt, with respect to the material available within a non profit’s busines, we have developed a few folders with SWAG photos that qualify as proper training material. Wherever there was a gap, we found a few photos available online.
This is the code: in this link, we can review the details of the code and a python script that can be passed along to any individual looking for performance details regarding the data tested.
The actual code:
from imageai.Classification import ImageClassification prediction = ImageClassification() prediction.setModelTypeAsResNet50() prediction.setModelPath(“resnet50_imagenet_tf.2.0.h5”) prediction.loadModel() predictions, probabilities = prediction.classifyImage(“people/train/cops/IMG_3743.jpg”, result_count=5 ) for eachPrediction, eachProbability in zip(predictions, probabilities): print(eachPrediction , ” : ” , eachProbability) |
We see no need to recreate the wheel. A Google Tensorflow workflow can look through images and derive the proper label. In the below example, we copy-paste the output from processing this SWAG image relative to a Google Image Recognition model.
military_uniform : 8.621860295534134
Newfoundland : 5.858585610985756
mountain_bike : 5.305056273937225
bulletproof_vest : 4.944176599383354
dogsled : 4.822985082864761
bulletproof_vest : 18.651099503040314
cash_machine : 16.095666587352753
jean : 10.59015840291977
crutch : 4.283485934138298
miniskirt : 3.1234027817845345
Oftentimes, the model will output candidate labels that have nothing to do with the image. Things like ‘miniskirt’ are predicted due to issues in the models accuracy. Our solution is to understand the models weaknesses. For instance, a ‘military uniform’ may surface a lot even though the average person would say ‘that’s actually a police uniform’. However, if the label is consistently applied to police and military uniform, there is no harm in just recognizing that the model will always identify a police uniform as a military uniform and designing application behavior around that label.