Why Machine Learning Is Important: Case Studies
With the increase in big data, machine learning has become an important technology for solving problems in the following fields.
Financial engineering:
It uses mathematical finance and machine learning techniques to support trading, hedging, investment, and risk management decisions.
In financial engineering, machine learning has traditionally been associated with pricing, valuation, and risk analysis of sellers' financial instruments.
As specific examples of using machine learning, researchers, quants, and analysts of banks, hedge funds, and asset management companies have used financial product pricing, interest rate analysis, yield curve creation and analysis, and stochastic volatility models. Analysis etc. can be mentioned.
In particular, machine learning is also used for algorithmic trading. In algorithmic trading, in the electronic financial market, trading decisions are made using an algorithm that automatically determines the timing and quantity of stock trading orders and repeats the orders. Algorithmic trading, which is applicable to both sellers and buyers, forms the basis for high-speed and high-frequency trading, forex trading, and related risk and execution analysis.
What is machine learning?
We will explain in an easy-to-understand manner the three methods used in machine learning (clustering, classification, and regression).
Knowledge quiz about machine learning
Image processing and computer vision:
Face recognition, motion detection, object detection
Machine learning is applied to perform processing to extract the intended information from digital images captured by cameras and the like. It is common to refer to image analysis that retrieves information and preprocessing that is performed to facilitate the analysis. In addition to removing noise and outputting a beautiful image, image processing has recently been increasingly used for image recognition to find a specific object on a computer. With the ability of computers to make intuitive judgments with human vision, there is a growing movement to replace visual inspections on production lines, which used to rely on human visual inspections, with computer-based image inspections.
Its applications are expanding to more advanced objects such as medical images, face recognition, and character recognition (OCR), and due to the trend of expanding the market of the robot industry, motion detection and three-dimensional image processing as robot eyes (image sensors) It is developing as a research field called computer vision that deals with. (What is image recognition)
Information Life Sciences:
Medical diagnosis, drug discovery
In the fields of life science and healthcare, the development and dissemination of a wide variety of centers has expanded the scope of application of machine learning by accumulating various data on a large scale. The practical application of medical treatment and drug discovery utilizing machine learning is expanding mainly in medical diagnosis, pathological image analysis, tumor removal, drug discovery target identification, new drug molecule design, and cosmetic development.
Energy production:
Accurate forecasting of electricity demand is essential for utilities to minimize costs and optimize operational efficiency and reliability. Using machine learning technology, it is possible to forecast power demand by modeling.
Predictive maintenance:
Automotive, aerospace, manufacturing, electricity, gas
Machine learning technology can also be applied to optimize costs by performing maintenance at the right time before a failure occurs, which is called Predictive Maintenance. In general, machine maintenance includes post-maintenance performed after a failure occurs and preventive maintenance (for example, vehicle inspection) in which regular maintenance is performed after a certain period of time has passed. On the other hand, predictive maintenance determines the maintenance time after correctly judging the state of the machine from the data, so it not only reduces costs by reducing unnecessary maintenance, but also avoids sudden failures and improves safety. It is expected to be connected. Predictive maintenance can be applied in various fields regardless of the industry by increasing the introduction rate of sensors into devices with the progress of IoT technology.
Natural language processing:
Text recognition, speech recognition application, OCR
Process human language (natural language) using machine learning. From spoken words used in everyday conversation to written words such as sentences and dissertations, we analyze the meanings of those words and classify the meanings of the words. There are cases such as chatbots that respond to inquiries using natural language processing technology.
In machine learning, the more data you have, the better your answer
Machine learning algorithms help you find natural patterns in your data, generate insights from them, and make better decisions and predictions. These are used daily to make important decisions such as medical diagnostics, stock trading and energy load forecasting. For example, media portals use machine learning to offer you recommended songs and movies from millions of choices. Retailers use machine learning to gain insights from their customers' buying behavior.
When to use machine learning
Consider using machine learning if you have a complex task or task that does not have a given formula or equation and contains a large dataset and a large number of variables. Machine learning is a good choice if you need to deal with the following situations:
Handwritten conventions and equations, such as face recognition and voice recognition, are too complicated.
The target rules are constantly changing, such as fraud detection from transaction history.
The nature of the data continues to change, such as automated trading, energy demand forecasting, and consumption trend forecasting, and programs need to adapt.
Machine learning mechanism
Two methods are used for machine learning. One is supervised learning that can train a model using known input and output data and predict future output. The other is unsupervised learning, which finds hidden patterns and unique structures of input data. Deep learning, which is attracting attention, is one of the algorithms included in machine learning that can be applied to both supervised learning and unsupervised learning. Let's take a look at the differences between AI, machine learning, and deep learning, and the algorithms for supervised and unsupervised learning.
Figure 1. Machine learning techniques include both supervised and unsupervised learning.
Master Machine Learning: MATLAB Step-by-Step Guide
First, download the ebook. Then download the code and start a hands-on tutorial to master machine learning techniques.
What is the difference between AI / machine learning and deep learning?
First, to explain these differences, which tend to be mixed, AI (artificial intelligence) is the broadest concept. Machine learning is part of AI, and deep learning is also part of AI and one of the algorithms for machine learning. The difference between machine learning and deep learning can be explained simply by the difference in information processing ability and speed.
In machine learning, humans generally define features (data sets such as numerical values and images used for prediction and classification). On the other hand, in deep learning, a huge amount of features are automatically learned by applying the neural network technology described later. Therefore, it can be predicted and classified with high accuracy, and is especially used in the fields of voice, language, and image recognition.
Figure 2. AI involves machine learning, and one of the machine learning algorithms is deep learning.
Supervised learning
Supervised machine learning builds a model that makes evidence-based predictions in the face of uncertainty. Supervised learning algorithms train a model with an existing set of input data and its response (output) so that the response to new data can be reasonably predicted. If you have existing response (output) data for the event you are trying to predict, use supervised learning.
In supervised learning, predictive models are created using classification and regression techniques.
Classification method
The classification method predicts discrete responses. For example, the classification is based on whether the email is genuine or spam, and whether the tumor is suspected of having cancer. The classification model trains you to categorize your data. Applications include medical imaging, voice recognition, and credit evaluation.
Use classification techniques if your data is tagged, categorized, or grouped into specific groups or classes. For example, handwriting recognition applications use classification to recognize letters and numbers. In image processing and computer vision, pattern recognition, especially unsupervised pattern recognition technology, is used for object detection and image segmentation.
Common algorithms for performing classifications include support vector machines (SVMs), boosted and bagged decision trees, k-nearest neighbors, naive bayes, discriminative analysis, logistic regression, and neural networks. ..
I will explain each algorithm.
Support Vector Machine (SVM)
Support Vector Machine (SVM) is a supervised learning algorithm that can be used for binary classification and regression. It belongs to a class of machine learning algorithms called the kernel method and is also called a kernel machine. Support vector machines perform classification tasks with hyperplanes that maximize the separation margin between two classes in the data (the boundary that separates the two data and the distance between each data). Support vector machines are one of the most recognized learning models in the world today and are often used in applications such as natural language processing, speech recognition, image recognition, and computer vision.
Figure 3. Image of Support Vector Machine (SVM)
Boosted and bagged (random forest) decision trees
A decision tree is a machine learning method that classifies and regresses using a tree structure so that the desired features appear in an easy-to-understand manner. Since the classification results are visualized in a tree structure, it is possible to visually grasp the factors that are strongly related to the objective variable and the condition rules in which the characteristics appear most.
Boosting and bagging (random forest) are machine learning techniques that improve the accuracy of decision tree prediction. Bagging does not generate only one decision tree, but generates a decision tree from a part of the data, replaces the data used for training many times to generate many decision trees, and finally obtains it. It is a method to integrate and evaluate the results of all decision trees. This allows multiple learnings to be performed, improving the overall prediction accuracy. Boosting predicts the wrong data well by first generating a decision tree for all or some of the data and then updating the decision tree with a heavier weight for the data that was incorrectly predicted in the prediction results. I will make it possible. This adjustment is repeated to generate multiple decision trees, and finally the results are combined to improve the prediction accuracy.
Figure 4. Image of boosted and bagged decision trees
k-nearest neighbor method
The k-Nearest Neighbor Algorithm is one of the simplest algorithms in supervised learning. A certain unknown data classifies the data into the class with the largest number among k similar classes (groups) in the data set by a majority vote method. This algorithm is effective for high-speed clustering of large-scale data sets when the number of clusters is known, and has also been used in product recommendation systems.
Figure 5. Image of k-nearest neighbor method
Naive bayes
Naive Bayes is based on the use of Bayes' theorem, assuming that the features are independent.
Naive bayes
Naive Bayes is based on using Bayes' theorem, assuming that the features are independent. Naive Bayes has the advantages that it is possible to estimate the features required for classification even when there is little training data, it can be executed with a small load because it processes at high speed, and it is not easily affected by unimportant features. Examples include train congestion, disaster prediction such as floods, and automatic article category (politics, sports, etc.) classification.
Image of naive bayes
Figure 6. Image of naive bayes
Discriminant analysis
In discriminant analysis, specific targets and non-specific targets are grouped based on the characteristics of the feature data. Discriminant analysis is used in marketing, etc., and is used to investigate the cause of separating "buyer" and "non-purchaser", "repeater" and "one-time purchaser", etc. ..
Image of discriminant analysis
Figure 7. Image of discriminant analysis
Logistic regression
Logistic regression analysis is an effective technique for performing multivariate analysis where the objective variable is qualitative and the dependent variable is quantitative. Logistic regression analysis predicts whether an event will occur or not, and the probability that it will occur. As an example of the objective variable, it is generally expressed by two values, for example, for or against, whether the election was won or lost, and whether the product was purchased or not. The value of the objective variable is between 0 and 1 (0% and 100%). If the value (probability) of the objective variable is greater than 0.5, the event is predicted to occur, and if it is less than 0.5, the event is not predicted to occur. Binomial logistic analysis is used when the objective variable is binomial, multinomial logistic regression analysis is used when the objective variable is three or more, and ordinal logistic regression analysis is used when the objective variable is ordinal.
Image of logistic regression
Figure 8. Image of logistic regression
neural network
Neural networks are one of the basic items when learning machine learning and deep learning. A neural network is a mathematical model that artificially imitates the network structure of nerve cells (neurons) in the human brain and has problem-solving ability. Learning is performed by combining multiple layers (hidden layers) consisting of artificial neurons (nodes) that are connected to each other by synaptic connections. Neural networks have the characteristic that they can perform function approximation even when classification and regression cannot be performed without complex function approximation. Neural networks are used for image recognition, voice recognition, pattern recognition, data classification, and future prediction, for example, optimizing power generation efficiency that accurately predicts the load of the power grid of a power company, and the account described on the check by ATM. There are examples such as reading numbers and deposits, and classifying tumors as benign or malignant.
An algorithm that has a deep layer consisting of the nodes of this neural network and extends to the deep layer is called deep learning.
A typical neural architecture. From the left, it consists of an input layer, a hidden layer, and an output layer.
Figure 9. Typical neural architecture. From the left, it consists of an input layer, a hidden layer, and an output layer.
Regression method
Regression techniques in machine learning predict continuous responses. For example, it has a wide range of applications such as sales forecast, demand forecast, store visitor forecast, economic analysis, temperature forecast, time to equipment failure, electrical load forecast and algorithmic trading.
Common regression algorithms include linear regression, non-linear regression, stepwise regression, boosted and bagged decision trees, and neural networks. Regularization is also often used as a way to prevent overfitting.
Linear regression
First, regression analysis is to predict and explain the response variable (objective variable) using multiple predictor variables (explanatory variables). In general, response variables are the data you want to predict and analyze, and the data you use to do so. The general formula for a linear regression model is in the form:
y = β0 + ∑ βiXi + ϵi (y is the response variable, X is the predictor, β is the coefficient of the linear equation, ϵ is the error term)
Linear regression uses one or more independent predictor variables to estimate the coefficients of a linear equation for optimal prediction of the response variables. Linear regression is generally simple regression with only one predictor (explanatory variable) for one response variable, multiple regression with multiple predictors, multivariate regression that is a model for multiple response variables, etc. It is divided into the types of.
Image of linear regression
Figure 10. Image of linear regression
Nonlinear regression
Non-linear regression generates equations that represent the non-linear relationships between successive response variables and one or more predictor variables, and predicts new observations.
y = f (X, β) + ϵ
If you cannot properly model the relationship with linear parameters, use non-linear regression instead of regular linear regression. Non-linear regression models are considered parametric and the model is described as non-linear regression. Machine learning techniques are used for other nonparametric nonlinear regressions. Parametric nonlinear regression functions a response variable (objective variable) as a combination of a nonlinear parameter and one or more explanatory variables. This can be either univariate (one response variable) or multivariate (multiple response variables). This parameter can take the form of an exponential function, trigonometric function, power function, or other non-linear function.
Image of non-linear regression
Figure 11. Image of non-linear regression
Regularization
Regularization is used in machine learning to train models, especially to prevent overfitting and enhance generalization. Overfitting, also known as overfitting, refers to a model that gives a highly accurate evaluation for certain data, but cannot fit other unknown data, resulting in a less accurate model. .. This is also called a state in which generalization has not been completed. Regularization adds information in the form of penalties to the complexity and non-smoothness of the model, and prevents this overfitting to enhance generalization. Regularized linear regression includes ridge regression, Lasso regression, and Elastic Net regression.
Stepwise regression
Stepwise regression is a regression analysis method that automatically adds and deletes predictor variables (explanatory variables) one by one to select a highly accurate model. Misspecification of the analysis results due to incorrect selection of multivariate predictor variables, such as multiple regression, is called model misspecification. For example, if an important predictor variable that can affect the response variable (objective variable) is not included, or if a predictor variable that does not affect the response variable is added, the model will be misconfigured. I will. You can create a well-fitted model by performing regression analysis using a method called the stepwise method, in which statistical software automatically selects a combination of valid predictor variables.
Supervised Learning Example: Predicting a Heart Attack
Suppose a doctor wants to predict whether a patient will have a heart attack within a year. Doctors have data on a number of patients in the past, such as age, weight, height, and blood pressure. We also know if these past patients had a heart attack within a year. So the question is, how can we combine and model the data we have to predict whether a new patient will have a heart attack within a year?
Using Supervised Learning to Predict Heart Attacks
Unsupervised learning
Unsupervised learning discovers hidden patterns and unique structures inherent in data. It is used to derive inferences from a set of input data that does not have a labeled response.
Clustering is the most common unsupervised learning technique. It is used by exploratory data analysis to discover hidden patterns and group structures in the data. Clustering is used for gene sequence analysis, market research, and object recognition.
For example, if a mobile operator wants to optimize the location of a mobile relay tower, machine learning can be used to estimate the number of clusters of relay tower users. Since mobile phones connect to only one relay station at a time, a clustering algorithm is used to design the optimal relay tower placement for a group or cluster of customers to receive optimized signal reception.
Common algorithms for performing clustering include k-means and k-means, hierarchical clustering, mixed Gaussian models, hidden Markov models, self-organizing maps, fuzzy c-means clustering, and subtractive clustering.
Figure 2. Clustering finds hidden patterns in the data.
Figure 12. Clustering finds hidden patterns in the data.
k-means
The k-means clustering is one of the typical methods of non-hierarchical clustering. First, non-hierarchical clustering is a method of creating clusters by collecting similar data from a dataset for the number of clusters determined by the analyst. The k-means method first divides the data into k defined clusters, then calculates the distance between the center of gravity of each cluster and each data point, and reassigns the data to the cluster with the shortest distance. It is a method to classify by doing until the cluster of. However, it should be kept in mind that the k-means method is highly dependent on the random division of the first cluster, so a single k-means cluster may not give the best results.
Image of k-means clustering
Figure 13. Image of k-means clustering
Hierarchical clustering
Hierarchical clustering is a simple and effective clustering method when the number of clusters in the data is not known in advance. First, there is a dataset that contains N data, and the initial state starts with N clusters, each consisting of only one data. Then, the two clusters with the closest distance among these N clusters are integrated. This is repeated and ends when all the targets are integrated into one cluster. The tree created by this process is called a dendrogram, which represents a multi-level hierarchy in which one level of clusters joins as clusters at the next level, rather than a collection of clusters. ..
Hierarchical clustering uses a tree diagram to represent the hierarchical relationships between clusters.
Figure 14. In hierarchical clustering, the hierarchical relationship between clusters is represented by a tree diagram.
Mixed Gaussian model (GMM)
A mixed Gaussian model (GMM) is often used for data clustering. Clustering with a mixed Gaussian model is useful when the data points may belong to more than one cluster, or when the clusters vary in size and the clusters have different correlation structures. It may be more appropriate than clustering using the k-means method.
A mixed Gaussian model refers to a model that uses a method of estimating parameters of how data points are distributed by superimposing multiple Gaussian distributions (normal distributions) (linear superposition).
In the mixed Gaussian model, you can not only divide the dataset into clusters, but also get the probability density distribution of the dataset. This distribution can also be applied to new sampling, regression analysis, and classification. The approximated mixed Gaussian model clusters in the given data by assigning data points to the multivariate normal component with the highest posterior probability. That is, with a mixed Gaussian model, the cluster assigns data to the component with the highest posterior probability.
Image of mixed Gaussian model
Figure 15. Image of mixed Gaussian model
Hidden Markov Model (HMM)
Hidden Markov model is a probability model
Clustering with a mixed Gaussian model is useful when the data points may belong to more than one cluster, or when the clusters vary in size and the clusters have different correlation structures. It may be more appropriate than clustering using the k-means method.
A mixed Gaussian model refers to a model that uses a method of estimating parameters of how data points are distributed by superimposing multiple Gaussian distributions (normal distributions) (linear superposition).
In the mixed Gaussian model, you can not only divide the dataset into clusters, but also get the probability density distribution of the dataset. This distribution can also be applied to new sampling, regression analysis, and classification. The approximated mixed Gaussian model clusters in the given data by assigning data points to the multivariate normal component with the highest posterior probability. That is, with a mixed Gaussian model, the cluster assigns data to the component with the highest posterior probability.
Image of mixed Gaussian model
Figure 15. Image of mixed Gaussian model
Hidden Markov Model (HMM)
The hidden Markov model is one of the stochastic models and refers to a Markov process with an unobserved (hidden) state. Its main purpose is to "guess the state transition sequence behind the observed symbol sequence". One of the fields where hidden Markov models are used is in the fields of DNA analysis and speech recognition. Hidden Markov model is applied to voice signals, and it is applied to anomaly detection, voice recognition and voice synthesis.
Self-organizing map
Self-organizing maps are a method of clustering based on neural networks. For input data, their similarity is expressed by the distance on the map, and classification is performed automatically. It is possible to cluster features that exist in high-dimensional data that are difficult for humans to identify without prior knowledge.
Self-organizing maps have examples in various areas such as economic analysis, civil engineering, facial recognition, search engines, and medical examinations.
Image of self-organizing map
Figure 16. Image of self-organizing map
Fuzzy c mean clustering
Fuzzy c-average clustering is a data clustering technique that groups datasets into N clusters. This is useful when the number of clusters is known and there are overlapping clusters.
In fuzzy c average clustering, all data points in a dataset belong to all clusters. However, data near the center of a cluster is more likely to belong to that cluster, and other data points away from the center of the cluster are less likely to belong to that cluster.
It starts by randomly guessing the center of the cluster. Then assign each data point a random membership grade for each cluster. Iteratively updating the cluster center and membership grade for each data point moves the cluster center to the correct location in the dataset and calculates the membership degree for each cluster for each data point. This iteration minimizes the objective function that weights the distance from any data point to the center of the cluster by the membership of that data point in the cluster.
Image of fuzzy c average clustering
Figure 17. Image of fuzzy c average clustering
How to decide which machine learning algorithm to use
Choosing the right machine learning algorithm can seem like a daunting task. There are dozens of supervised and unsupervised machine learning algorithms, each with a different learning method. ..
There is no best method or method that can be used for anything. Finding the right algorithm depends on trial and error. Even the most experienced data scientists can't tell if an algorithm works after all without trying it out. However, the choice of algorithm depends in part on the size and type of data to be handled, the views you want to derive from the data, and how to use those views.
Figure 3. Machine learning techniques
Figure 18. Machine learning techniques
Here are some guidelines for choosing between supervised learning and unsupervised machine learning:
Supervised learning if you need to learn a model that makes predictions (for example, estimating future values of continuous variables such as temperature and stock prices) and classification (for example, identifying the model of a car in a web video). Select.
Choose unsupervised learning if you need to dig into the input data or if you need to train a model that finds a good internal representation of the data, such as partitioning the data into clusters.