Computer Vision: Image Understanding

Computer Vision: Image Understanding

Article Data & AI

This article is part of our “Data Science Digest” series. With this series, we will help you keep up with the developments in Data Science, show you the potential of data science techniques and give you a sneak peek into some of the exciting things we’ve been working on with the Data & AI team at Eraneos. In this article, we will explore some of the potential applications of Computer Vision in advertising and other fields.

Computer Vision is one of the fields with the widest potential applications in our daily lives. Unfortunately, it is extremely difficult to create from beginning to end a reliably functioning technology. At its core, the main purpose is image understanding. Note the use of the word ‘understanding’. This implies several things – three to be precise.

First and foremost, we’ve had to reverse-engineer how our eyes work (difficult). Think of technologies such as cameras or sensors which have been constantly developed for the last several decades. Second, we’ve had to teach computers how to describe what they see (very difficult). The human brain, for example, was built with the intention to see and understand; some research indicates that around 60% of the brain is ‘involved’ in the process of vision. Third and final – to understand (hair-tearingly difficult). Consider, how simple it is for you to see an apple and in a split second  receive thousands of signals giving you information about its shape, angle, color, taste, weight, is it edible or not, how you personally feel about apples, what kind of apples you prefer, and maybe an anecdote that your great-grandmother told you once about apples 15 years ago. Now, try to engineer that for thousands of objects.

This does not aim to discourage you. We’ve actually made strides in the development of Computer Vision. The technology is probably more widely used than you imagine.  Among other things, it’s being used for face detection in cameras, terrain detection in drones and airplanes, vehicle license plate recognition, 3D technology, augmented reality, MRI scans, and most recently in conjunction with deep learning in order to improve accuracy in fields such as medicine and self-driving cars.

At Eraneos, we’ve also worked on some interesting cases using Computer Vision. By analyzing pictures and training a neural network, we’ve been able to automate radiology image analysis for classifying tumors and fractures. While for another client we analysed what kind of images lead to a higher conversion rate when used on a vacations (leisure) website. More recently, we’ve been working together with TV and digital advertisers and applied computer vision techniques to predict consumers’ response to different advertising commercials before they are even launched; so, how did we do it?

Using Python, we’ve created Neural Networks that analyze a video frame by frame, categorize and recognize more than 2000 objects (such as people, buildings, cars, landscapes, cities) and even more specific characteristics such as gender, age, faces, and emotions. All of this information is correlated with cues taken from historical data of focus groups in order to understand people’s attitude towards certain objects or events. This has allowed us to score and rank new commercials before they are even launched.

One limitation we’ve faced is video analysis. Although the technology is being developed, currently we are only able to analyze static images (hence why videos are currently broken down frame-by-frame) , while it would also be very interesting to analyze the correlation between images. Once this is in production, you can theoretically subdivide videos into content-based classes and make them searchable. This means you can for example search for all fragments that relate to a certain theme or find within a series of videos all timestamps in which a specific person is being displayed and extract how this character ‘develops’ over the course of time. Going even deeper in combination with Speech to Text and Natural Language Processing techniques you can theoretically recognize humor, sarcasm and search for specific conversation topics within a video.

By Yordan Tachev

Knowledge Hub overview