Intro to AutoML

by Steve Deng


According to the definition of Wikipedia, “Machine learning (ML) is the study of computer algorithms that improve automatically through experience and by the use of data.” Machine learning algorithms typically train a model to make decisions or predictions on the basis of data. Deep learning is a special family of machine learning techniques that use multiple layers of artificial neural networks as the representation of models. In the past decade, ML, especially deep learning, has made unprecedented breakthroughs in various applications such as image processing, face recognition, natural language processing, predictive maintenance, and scientific discovery. Today our daily life heavily depends on intelligent applications empowered by machine learning models.

Building machine learning models, however, turns out to be challenging. First, data scientists analyze the problem and data samples so as to design the form of models. Second, the parameters in a given model, which amounts to millions or even hundreds of millions, have to be trained with power computing machines. The above two steps typically need to be iterated before a satisfying model can be derived. As a result, the models building process has become the bottleneck of development in many AI applications. The problem is even more pressing in IoT applications where a huge number of models have to be developed.

AutoML, i.e. automatic machine learning, is a series of technologies to build a model with little or even without the need for human intervention. The original ideal of AutoML can be traced back to early 2000’s. The original idea was to incrementally tweak a template consisting of pipelined machine learning modules with adjustable parameters.

With deep neural networks (DNNs) based model representations become more prevalent, now an important part of AutoML is neural architecture search (NAS), which is to automatically search a proper network architecture as well as its parameters (trained by input data) and hyper-parameters (traditionally set by human experts).To do this, we need three key techniques. First, we define a solution space that covers (almost) all possible forms of deep neural networks. An early approach is to represent a deep neural network as a chain of operations, but recent works tend to use directed acyclic graph (DAG) as a unified representation. With such a representation, each DNN is equivalent to a point in the solution space with a specific encoding and an initial solution can be randomly picked up as the starting point of NAS. The second technique is a search strategy, which select the next solution according to the evaluation of the current solution. The essential point is the search strategy can find a sufficiently optimized solution without the need to traverse the whole solution space. Today we use evolutionary algorithm or reference learning as the search strategy. The third technique is an evaluation method that quickly assesses the quality of the current solution. Of course, this objective can be attained by fully training the model corresponding to the current solution and then calculating the error functions. But this would be too slow and thus we are more oriented to exploiting an approximate estimation. With the above techniques, we can build an optimization framework that incrementally searches an optimized DNN model for a give problem.

In the beginning state of NAS, it will take a few days to finish the whole process for a DNN model of medium complexity. Recent technical developments have shortened the time to hours on a reasonable number of GPU cards. State-of-the-art NAS frameworks are capable of generating DNNs models with good qualities in terms of prediction accuracy and computing efficiency on many applications. As a matter of fact, even many researchers already began to design DNNs by taking solutions derived by NAS.

It should be noted that AutoML, especially NAS, is still a computing hungry job. In the search process, we need to train many different solutions and training needs a lot of computing power. In other words, GPU and other high-performance computing platforms are even more critical for the success of AutoML.