TPOT and the Automatic Statistician
2019-07-03
Chapter 1 Introduction
Machine learning has gained in importance over the last years. While machine learning shows promising results in many problem domains, the complexity of machine learning is often an obstacle for its application. This development has given rise to the research field automated machine learning (AutoML) that investigates the problem of “automatically (without human input) producing test set predictions for a new dataset within a fixed computational budget” (Feurer et al. 2015). Characteristics of machine learning like the non-existence of a machine-learning method that performs best on all data sets and the need for hyperparameter optimization has lead to the development of a diverse set of AutoML methods and systems. The book Automatic Machine Learning (Hutter, Kotthoff, and Vanschoren 2018) provides an excellent overview of selected AutoML methods and systems.
While the AutoML methods and systems cover a lot of use cases, their different user interfaces and used programing languages contradict the idea of AutoML, which should be easy to use for different addressees. The availability of the AutoML methods and systems in a uniform programmin language like R could help to facilitate their application and comparison.
In the described context this reports goal is to investigate the two AutoML methods TPOT and the Automatic Statistician and to document their implementation in R. TPOTs goal is to automate all steps of machine learning after preprocessing was done by generating and optimizing machine learning pipelines (Olson and Moore 2018). The Automatic Statistician wants to automate all steps of machine learning with the special focus of making the predictions available in a human-readable report including graphs and statistics (Steinruecken et al. 2019).
To reach this goal both AutoML systems are introduced separately in chapter 2 (TPOT) and chapter 3 (The Automatic Statistician). For each system the core theoretical concept is described, followed by an description of the implementation in R. In the conclusion of this report, a final evaluation of the individual methods and implementations in the context of AutoML’s objectives is made.
References
Feurer, Matthias, Aaron Klein, Katharina Eggensperger, Jost Springenberg, Manuel Blum, and Frank Hutter. 2015. “Efficient and Robust Automated Machine Learning.” In Advances in Neural Information Processing Systems, 2962–70.
Hutter, Frank, Lars Kotthoff, and Joaquin Vanschoren, eds. 2018. Automatic Machine Learning: Methods, Systems, Challenges. Springer.
Olson, Randal S., and Jason H. Moore. 2018. “TPOT: A Tree-Based Pipeline Optimization Tool for Automating Machine Learning.” In, edited by Frank Hutter, Lars Kotthoff, and Joaquin Vanschoren, 163–73. Springer.
Steinruecken, Smith, Janz, Lloyd, and Ghahramani. 2019. “The Automatic Statistician.” In Automated Machine Learning. Springer.