2020/10/25 07:54:56

Image recognition in deep learning

Image recognition in deep learning

I think you often hear about artificial intelligence these days. Why do you often hear these days? For example, in image recognition, the flow is to process many images and find similar parts to make a judgment, but theoretically it has been around for a long time, but it took a huge amount of time even on a supercomputer at that time to process it. I couldn't put it to practical use. Nowadays, it is expanding because it can process in an instant by using a high-spec GPU. This time, I would like to introduce image recognition.

What is "image recognition"?

■ Overview

"Image recognition" is a type of pattern recognition technology that extracts objects (characters / faces, etc.) and features of objects (shape, dimensions, numbers, brightness, colors, etc.) from image data. / Analysis / Identification / recognition and detection method.

■ Basic explanation

Image recognition is one of the pattern recognition technologies that identify features by grasping features from images.

It is a mechanism that identifies the outline of the object from the image data, separates it from the background, extracts the feature, performs matching and conversion, and identifies and recognizes the target object and feature. What is the object? To analyze.

In the case of human beings, it is possible to easily identify the person or object in the image by understanding and making a judgment from the experience gained about "what is in the image?". It is an act that is unconsciously performed by humans, but for a computer that manages information on a pixel-by-pixel basis, it is extremely difficult to understand what is reflected in an image, and it is an advanced and complicated process.

Image recognition is a technology that allows a computer to understand "what is in an image?" And has received a great deal of attention in recent years.

■ What is pattern recognition?

"Pattern Recognition" is one of the natural information processing, and is "the process of selecting and extracting objects with certain rules and meanings from data containing miscellaneous information such as images / sounds". is there.

For the human brain, it is a natural process to acquire perceptual / linguistic competence in the developmental stage of infants / children, but it is a process that is artificially realized by a computer in terms of both accuracy and speed. It will be a high hurdle.

Main pattern recognition technology

In addition to image recognition, pattern recognition includes the following techniques.
-Voice recognition --- Recognizes and extracts human voice from voice data and interprets it as a language
-Optical character recognition (OCR) --- Recognizes characters from image data and converts them into text data
・ Full-text search system --- Search for documents by recognizing specific keywords from a large amount of document information, etc.

The main methods used in pattern recognition

As the main method for pattern recognition, a non-rule-based method for constructing identification parameters from a large amount of data by machine learning has become the mainstream.
·neural network
・ SVM (Support Vector Machine)
・ K-nearest neighbor classifier
・ Bayes classification, etc.

■ Background

Computer-based image recognition research has been underway since the 1960s.

Since the 2000s, "advancement of deep learning technology in artificial intelligence (AI)" and "improvement of hardware performance" have dramatically improved the accuracy of image recognition, reaching image recognition performance equal to or better than humans. With the advent of technology, it has made remarkable progress in recent years.

Furthermore, with the spread of smartphones / SNS, a huge amount of photo data (image data) is taken every day, and the accumulation of image data is accelerating. In order to utilize these data, it is thought that the need for computer recognition of images will further increase.

Image recognition mechanism

■ Extracting things from pixel images

In order for a computer to perform image recognition, it is necessary to extract an object from an image as a preliminary step.

Computers regard image data as "a collection of pixel-based information (color tone / brightness, etc.) that is the smallest element that makes up a digital image," so they must recognize an object from chaotic information that is full of noise. Must be.

In image recognition, a computer is made to understand what is reflected in an image through a process of extracting a certain pattern from image data which is a set of pixels and reading the meaning from the pattern.

■ Image processing

Before image recognition, image processing is performed to make it easier for the computer to recognize the image.

Image processing is generally performed in the following procedure.
1. 1. Remove image noise / distortion, etc.
2. Adjust brightness and hue
3. 3. Emphasize the outline of an object
Four. Area extraction --- Cut out the area of ​​the object from the image

By extracting the area, it becomes possible to handle the object with a certain size, and it becomes easy to perform image recognition.

Image recognition in deep learning

■ Specific object recognition

In specific object recognition, "a huge amount of learning image data" and "corresponding label (information about what the image data represents)" are registered in advance, and the object shown in the input image is displayed. Identify "what it is" about.

■ Image recognition using deep learning

In image recognition using deep learning, a large amount of image data is read and features are extracted in detail. Because it is a field of image recognition where it is difficult to identify features, deep learning demonstrates its learning ability and produces the result that the object recognition rate increases significantly.

Object identification by probability display

By learning the characteristics of an object from a large amount of image data, it becomes possible to express with probability what the object is from the image data in which an unknown object is captured.

Even humans, such as "cat kid" and "lion kid", can present it with probability.

Case Study
Image recognition technology enables personalization through data and technology, and many solutions have been put into practical use in a wide range of fields.

■ Security

・ Surveillance camera image analysis --- suspicious person intrusion detection, collation with criminal database
・ Face recognition system --- Iris recognition, immigration control at the airport

■ Character recognition

・ Character recognition technology (OCR)
・ Postal sorting machine
・ Real-time translation --- Ability to translate text in real time using a camera

■ Face recognition

・ Camera app
・ Smile shutter function of digital camera

■ Medical

・ Image diagnosis --- Diagnosis support from internal images taken by CT / MRI scan, etc.

■ Agriculture

・ Understanding the growth status of agricultural products

■ Industry

・ Product inspection --- Visual inspection alternative, product defect inspection, abnormal product detection
・ Manufacturing automation --- Parts selection, alignment during automatic mounting of parts on the board
・ Automatic driving assist technology

■ Marketing

・ Product recommendation --- Automatic classification of product images from user purchase history

Image recognition library
With the development of image recognition research by deep learning, image recognition technology has reached the practical stage in daily life, and many libraries for image recognition have appeared.

These libraries have reached a level where they can learn and judge not only image recognition but also "what is the object?" And "what kind of situation is it?".

The following are typical image recognition libraries.

■ Deep learning library


OpenCV is an open source library developed / published by Intel, and is known as a representative image processing / image recognition library.

In addition to image recognition, it has advanced functions such as "image noise removal," "3D image processing," and "AR / VR support."


TensorFlow is a machine learning / deep learning / multi-layer neural network library developed by Google.

Complex networks can be described in an easy-to-understand manner using data flow graphs, and can also be used for image recognition.


Caffe is one of the deep learning libraries, which is said to be particularly strong in image recognition, and has the feature of being able to process at high speed.


Chainer is a deep learning framework made in Japan developed by Preferred Networks.

You can flexibly write and learn neural networks in Python.

■ Cloud service

Watson APIs "Watson Visual Recognition"

"Watson Visual Recognition" is an image recognition function that uses IBM's Watson technology.

A trained model can be used, and recognition results can be obtained in multiple languages ​​including Japanese.

Google Cloud Platform "Google Cloud Vision API"

"Google Cloud Vision API" is an image recognition service that can be used on Google Cloud Platform. Many objects can be recognized using the trained model.

AWS "Amazon Rekognition"

"Amazon Rekognition" is an image / video analysis service that can be used on AWS.

You can easily incorporate analytics into your application to detect a variety of objects (objects, people, text, scenes, activities, inappropriate content, etc.).

Azure "Computer Vision API"

"Computer Vision API" is an image recognition service that can be used in Azure.

You can extract a wealth of information from images and classify / process visual data.

Information about the visual content in the image is returned and labeled, and automatic restriction of inappropriate content can be enabled.