Artificial intelligence is a broad term, but since it has been advancing rapidly in recent years, attempts at deep learning have made remarkable progress. This time, I would like to introduce an article that explains deep learning in a little more detail.
What is deep learning? | Differences from basic knowledge, mechanisms, application examples, and machine learning
In recent years, AI (artificial intelligence) has been rapidly introduced as a technological innovation in various fields. According to a survey by ITR Co., Ltd., the sales amount of the six major AI markets in FY2018 was 19,950 million yen, a significant increase of 53.5% from the previous year. In addition, it is expected to increase steadily after that, and is expected to reach 64 billion yen in 2023. The technology that supports this development is "deep learning" or "deep learning". In this article, we will talk to Associate Professor Toshihiko Yamazaki of the Graduate School of Information Science and Technology, the University of Tokyo, and explain in detail from application examples to the mechanism.
Deep learning is a machine learning method that enhances expression and learning ability by connecting neural networks in multiple layers.
There were problems such as lack of expressiveness and overfitting just by making multiple layers, but it was solved with the help of various ingenuity such as Dropout method and ReLU and big data.
Currently, it is the most commonly used algorithm for constructing AI.
Currently, it is said to be the third boom in AI. The technology that became the breakthrough of this third boom is deep learning.
In 2012, at the global image recognition competition "ILSVRC", "Super Vision" developed by the University of Toronto won the victory with overwhelming accuracy, overcoming famous research institutes such as the University of Tokyo and Oxford University, and entered the artificial intelligence research world. I gave a tremor.
The technology called "** Autoencoder" developed by Professor Geoffrey Hinton of the University of Toronto and others makes it possible to capture the features of the neural network itself. The learning method using this * multi-layer neural network, which was also used in "Super Vision", has come to be called "deep learning".
* Neural network consisting of "input layer", "hidden layer (intermediate layer)" and "output layer"
** A method of adjusting the weight parameter so that the value of the output layer of the neural network is the same as that of the input layer.
Differences and relationships between deep learning and artificial intelligence / machine learning
"Machine learning" is a word that is often heard together with "deep learning". I will explain these differences, which are easily confused and misunderstood, and what artificial intelligence AI is in the first place.
Artificial intelligence (AI ... Artificial Intelligence) is described in the dictionary as "a computer system equipped with the functions of human intelligence such as learning, reasoning, and judgment." (Excerpt from Daijirin 3rd Edition)
However, from an academic point of view, the term "artificial intelligence (AI)" is ambiguous, and different people think of it differently. At present, the definition of artificial intelligence is not clearly defined even among experts. See the article below for details.
Machine learning is a technology in which a computer learns a large amount of data and automatically builds algorithms and models that perform tasks such as classification and prediction.
In addition to neural networks, there are various technologies and algorithms that make AI work, such as the "nearest neighbor method," "decision tree," and "support vector machine."
Proper use of deep learning and machine learning
The difference between deep learning and machine learning is whether the features are "automatically learned by the machine" or "manually input by humans".
Therefore, machine learning is often used when limited and structured data is available. On the other hand, deep learning is often used when using complex unstructured data, and is applied to fields such as "speech recognition," "image recognition," and "natural language processing."
How neural networks work
What kind of structure does deep learning work in? Here, we will explain the mechanism of neural networks, which is a framework for deep learning, as a method for making machine learning work.
First, enter the data in the input layer, and then enter the * features, which are indicators for recognizing the data. The input is multiplied by the weight w1 w2 ‥, which corresponds to the connection strength between nerve cells, and input to the neurons in the output layer.
The neurons in the output layer pass this sum of the inputs through the ** activation function and output the final result. This series of flow from input to output is called "perceptron". A neural network is made up of a combination of multiple perceptrons.
* A numerical value of the characteristics of the learning data
** Non-linear or identity function applied after linear transformation in neural network
In deep learning, the feature amount is judged by the computer by making multiple intermediate layers of the neural network.
--Yamazaki
"For example, if you have multiple layers, one layer can think about color, another can think about shape, and so on. Deep learning can automatically learn what is important. It is more accurate than using human-thought features. "
Deep learning learning methods
We would like to introduce two learning methods for deep learning, which are currently being actively researched: "Pre-train & Fine-tune" and "multimodal learning".
"Pre-train & Fine-tune" is a learning method that enables advanced analysis by learning general image information in advance and transferring it to images in a specialized field for learning.
--Yamazaki
"For example, when you want to analyze medical images, it is difficult to collect enough medical images for learning. Therefore, it is common to first train various images lying on the Internet. Let them understand what an image is. Based on this, by learning additional medical images as specialized knowledge, you will be able to analyze specialized images. "
"Multimodal learning" is a mechanism in which AI learns using multiple types of data.
--Yamazaki
"For example, bring an image, a voice, and a text. First, let the image learn with the image, the voice with the voice, and the text with the text. Then, stop learning once, connect the three learning results learned, and then again. Re-learn and return the learning result (loss) as a whole.
In other words, it's a way to learn images, sounds, and text both individually and entirely. I think that not only has deep learning improved recognition accuracy, but it has also made a big contribution by breaking down the barriers between traditional fields such as image, voice, and language, and allowing people to move freely. "
Whereas many other machine learning algorithms are batch learning that requires all the data to be learned at once, neural networks are sequential learning while stopping learning, changing data, or changing architecture. It is possible to make it. This is creating more applications.
Predict the effects that will be obtained by hitting commercials, such as "what percentage of people will remember" and "what percentage of people will want to buy".
--Yamazaki
"For example, you can learn and predict various data such as image data, audio data, * metadata, on-screen captions, narration, etc. at once by deep learning. With Pre-train & Fine-tune above. It's an approach that combines multimodal learning. "
"GAN" is an algorithm that learns features from prepared data and generates pseudo data.
--Yamazaki
"By using two neural networks that distinguish the real thing and generate a fake, we will work hard to improve the accuracy of the fake real thing.
For example, when making a counterfeit note, it's easy to imagine the criminal trying to make a counterfeit note and the police and bankers who detect it work hard with each other. A fake neural network does not work well at first, but it can be made more accurate with some ingenuity.
The neural network that distinguishes the real thing also improves the accuracy of distinguishing because it always sees the product of the fake generation neural network. Ultimately, GAN produces what goes through indistinguishable fake. "
This technique is also an example made possible by repeating learning both individually and as a whole.
* Additional data about the data itself that accompanies certain data
Example) Industry, how to hit commercials, etc.
AI business application example applying deep learning
Deep learning is applied to various technical fields such as image recognition, voice recognition, natural language processing, prediction, video analysis, and anomaly detection. In the following, we will introduce practical examples.
Photo by Gerd Altmann on AIX
"OOH AI" is a service that uses AI to generate oversized image materials. By using deep learning, it is possible to increase the resolution to hundreds of thousands of px, and to increase the resolution of photos and illustrations to 4 times the height and 4 times the width of the original image. It is mainly for advertising materials that you want to use for outdoor advertising and traffic advertising, and you can produce images for OOH quickly, at low cost, and with high quality.
Google Home is an AI speaker manufactured and sold by Google. An AI speaker is a speaker that has the function of extracting the speaker's commands by voice recognition and understanding and executing the instructions by natural language processing. Google Nest is hands-free and has features that help you in your daily life, such as research and translation. In addition, you can enjoy various entertainment such as playing music and playing game apps.
Influenza forecast is a service that can predict and visualize the degree of influenza epidemics all over the country. A prediction algorithm using deep learning is used based on the data on the number of new influenza patients. In addition to being able to predict the epidemic period from this week to 4 weeks after each region, the degree of epidemic can be grasped by dividing it into levels 0 to 3, which can be useful for influenza prevention.
"People Counter Pro" is a video analysis product released by Canon that uses video analysis technology that counts the number of crowds of thousands in real time from images taken with a network camera using deep learning. It's software. By detecting the head of a person from the video, the number of people can be counted even in a crowded situation, and the number of people in the specified area can be displayed and the transition graph can be displayed. Therefore, it can be used for grasping and analyzing the congestion situation.
Video analysis software "People Counter Pro" equipped with technology that can instantly count thousands of people
You can learn deep learning in a variety of ways.