Chapter 4 Conclusion

This report documented and evaluated how two AutoML systems are maid accessible in the programming language R. With tpotr and its interface to mlr there exists now an easy way of using tree-based pipeline optimization in R. Users with basic understanding of R can generate good fitting machine learning pipelines without the need for manual machine learning operator selection and optimization. The implementation of tpotr as an interface to the Python package TPOT provides the advantage of an upward compatibility to improvements in future releases and an active maintenance of the code base.

The Automatic Statistician as the second AutoML system brings up the idea of interpretability, since its aim is to generate human readable reports. This can be realized by the usage of an interpretable models like Gaussian Processes. In this report a different approach was evaluated, which separates the model construction from its interpretation. This approach allows for the usage of other well-performing AutoML systems for the model construction part. AutoStatR implements this approach by using tpotr for the model construction and selected model-agnostic methods like ALE for model interpretation.

Considering the objectives of AutoML described in the introduction we made several observations during the development phase. The tpotr package clearly reduces the hurdle of machine learning for users that have at least some experience with R. Moreover tpotr provides the opportunity to reduce time spend in machine learning by providing a more experienced user with an insight about a useful pipeline for a given problem. While using tpotr can also have drawbacks, a bigger challenge of AutoML occurs to be its interpretability and applicability for very inexperienced users. AutoStatR provides a possible starting point for these challenges as it works well on selected data sets (which can be found in the example folder of the project), but more work and research needs to be done on challenges like interpretability and handling an arbitrary data input.

To contribute to the field of AutoML and especially facilitate its applicability in R, we plan to make tpotr, AutoStatR and a corresponding documentation available as an open source project on GitHub[https://github.com/thllwg/tpotr][https://github.com/thllwg/AutoStatR].