AutoML for predictive modeling
Automating machine learning is the topic of growing importance as first results are being used in practice bringing significant cost reduction. This article consist of slides with commentary, which I’ve used for my talk at ML Prague conference. It maps state of the art techniques and open source AutoML frameworks mostly in the field of predictive modeling.
I have also presented our research that is being partially funded by Showmax within our joint laboratory at Faculty of Information Technology of the Czech Technical University in Prague.
I would also like to express gratitude for computation resources provided by Showmax, that make our extensive experiments possible.
Let’s start with a motivation from recent Google marketing video explaining how Google AI applies their AutoML in Waymo.
Deep learning AutoML
Google AI applied AutoML to find better alternative architectures of convolutional networks classifying images in self-driving cars.
The marketing department is reporting 30% speedup with 8% accuracy improvement which is impressive.
When you take a closer look, you find out that is is actually not AND but OR. You can have up to 30% speedup without sacrificing accuracy and 8% accuracy boost with the same speed. Anyway, those are still good numbers and AutoML is therefore worth the effort. Every dot in the graph represent a convolutional neural network and the goal is to find network with an architecture allowing for best classification performance with minimal latencies (top left).
CNN architectures are evaluated on a simple proxy task to reduce computational cost and different search strategies are evaluated. Blue points are CNNs evaluated by Neural Architecture Search strategy described later in this post.
These experiments resemble older study large scale evolution of CNNs that also required massive computational resources.
But it is still cheap and easy when compared to architecture search of video classifiers, where 3d convolutional neural networks are being optimized.
Evolutionary search was also used to optimize both encoder and decoder part of the Transformer, showing that neural translation models can be also well optimized when it comes to architecture choices of text processing convnets.
Automatic Deep Model Compression
To speed up recall and simplify deep convnet models, one can use AutoMC opensource framework PocketFlow.
AutoMC or AMC are increasingly useful when it comes to deploying accurate models to Mobile Devices or embedded systems. Reported speedup is quite impressive.
Deep AutoML Frameworks
Auto Keras uses network morphism to make the Bayesian optimization more efficient. Editing distance in the architecture space (number of changes needed to traverse between architectures) is used as similarity proxy. As you see, the usage is straightforward and there are obviously no parameters to specify manually.
AutoML in general
We have surveyed AutoML deep learning approaches, but this is just one class of AutoML techniques you can find in predictive modeling. In general, AutoML approaches are most efficient in following domains, optimizing performance, speed for predictive models or both. There are also many other criteria that can be taken into account such as explainability of predictive models, however AutoML capable of optimizing those criteria are in their infancy.
Data preprocessing for predictive modeling needs to be often performed in more traditional and old-fashion style, because convnets are well applicable just to some data sources (images, text, speech..). Also there are many predictive modeling methods apart of neural networks. These methods have typically a lot of hyperparameters. AutoML techniques that have been developed for this domain are discussed below.
At first, look at some general tools that can be used in AutoML search. Random search can be surprisingly efficient and it is therefore quite popular. It is mostly due to the fact that it is very difficult to search the architecture space as it often consists of mutualy dependent continuous and discrete dimensions.
One of the more interesting techniques is Bayesian optimization using Gaussian processes. Let’s assume you have just one continuous hyperparameter. You explore regions with highest potential (higher bound).
Another interesting technique is Hyperband. It is bandit inspired approach that simultaneously learns and evaluates several architectures and kills half of them time to time. It helps to exponentially allocate computational resources towards most promising architectures. The drawback is that you might kill would-be-great architectures that need more time to optimize.
AutoML frameworks for data mining
Here we introduce some of the most popular AutoML frameworks for more traditional predictive models often including data preprocessing.
TransmogrifAI also optimizes data preparation stage (e.g. data projections).
There are many more similar tools both open source (auto sklearn, hyperopt sklearn, auto weka) and commercial (h2o driverless, datarobot automl). Note that Google Cloud AutoML do not provide similar tool yet. Provided services are oriented towards deep learning models optimization.
Advanced AutoML approaches
Beyond simple hyperparameter optimization (even using advanced heuristics) there are many interesting approaches that can deal with architecture choices both in micro and macro level.
Neural Architecture Search
A clever approach to NAS is to map architecture choices into probability space and use backprop to optimize architecture together with weights as demonstrated in DARTS.
Another approach is mapping the architecture into a continuous space and back for example by LSTM.
You can perform the gradient search in the continuous space as demonstrated by NAO.
Search for predictive ensembles
For last two decades, I have been trying to find efficient algorithm to search for architecture of predictive models including complex ensembles.
Such ensembles often win Kaggle competitions and were nicknamed Frankenstein ensembles.
Meta-learning templates formalize hierarchical predictive ensembles as we have explained in Machine Learning journal article.
You can use evolutionary algorithm (GP) to search for architectures.
Searching for good templates representing well performing predictive ensembles is quite complex task.
We show winning architectures for couple of datasets from the UCI repository. Note that some for some datasets, simple models are more appropriate than complex hierarchical ensembles (ensembles are not better so it is better to choose simple predictive model).
You can evolve architecture of predictive model on one dataset and evaluate it on other dataset. Note that some datasets from UCI are so trivial (breast, wine) that it does not matter what you use for predictive modeling.
When we attempted to apply our evolutionary AutoML framework to Showmax churn prediction dataset, it was not scalable enough.
Therefore we have reimplemented the framework for distributed data processing on top of Apache Spark. We are experimenting with new interesting concepts and features in the AutoML process.
- Gradually increasing amount of data is being used for evaluation of templates and selection of prospects across evolutions
- We are able to increase complexity of templates over time(generations) starting from population of base models
- Process is divided into evolutions by taking into account predefined by user amount of time (anytime learning)
- Various approaches are being used to preserve diversity across population as well as during mutation of individual templates
- We perform multiple co-evolutions (templates and hyperparameters in current version). We are able to share hyperparameters within templates’ population.
Results on Airlines data show that we can evolve predictive models that can both train and recall in fraction of time needed for deep learning models of the same accuracy.
These simple predictive templates are not sensitive to hyperparameter tuning and perform consistently well on this task. Of course you can find many other tasks, where you need different base methods than those we have implemented so far, therefore we are extending the portfolio.
Other AutoML domains
AutoML can be applied in clustering, however it is much more difficult.
It is possible to increase robustness of clustering algorithms and add some autoML features such as automated cutoff. The AutoML of cluster ensembles is difficult due to unsupervised nature of clustering. You can join our efforts in this open source project.
In recommender systems, AutoML can optimize both structure of the recsys ensemble and hyperparameters. We managed to implement a large-scale online AutoML system in production in Recombee combining Bandits, GP and other approaches.
This work is also part of Prague AI initiative.