KDD 2019 Tutorial on

Saturday, August 4. Ancorage, Alaska.

When considering a data set it is often unknown how complex it is, and as a result it is **difficult to choose** a model that *describes its main characteristics without overfitting*. Often these choices are left to the domain expert, which is highly unsatisfactory in practice. The **Minimum Description Length (MDL) principle** answers the model selection problem from a *clear and intuitively appealing viewpoint*. In a nutshell, it asserts that the best model is the one that compresses both the data and model best. In this tutorial we do not only give an **introduction** to the very basics of model selection, show **important properties** of MDL-based modelling, successful examples as well as **pitfalls** for how to apply MDL to solve data mining problems, but also introduce **advanced topics** on important new concepts in modern MDL and emerging applications in dynamic settings. Throughout, our goal is to *make sure that audience grasps not only the basic theory, but also see how this can be put to practice*.

**Selecting a model** for a given set of data is at the heart of what data analysts do, whether they are statisticians, machine learners or data miners. However, the philosopher Hume already pointed out that the "*Problem of Induction*" is unsolvable; there are infinitely many functions that touch any finite set of points. So, it is not surprising that there are many different principled approaches to guide the search for a good model. Well-known examples are Bayesian Statistics and Statistical Learning Theory.

In the last decade information theoretic methods for selecting the best model slowly but surely became popular in the data mining community, and have led to state-of-the-art solutions in areas as diverse as pattern based modelling, change detection, and causal inference. In this tutorial we will review the **state-of-the-art in information-theoretic model selection** based on the Minimum Description Length principle and its **implication and applications in data mining**.

The **first part** of the tutorial gives a **gentle introduction to the Minimum Description Length** principle.
We will discuss topics such as model selection, Occam's Razor, **Two-Part MDL** and *its relation to AIC, BIC, MML, and Kolmogorov Complexity*. We will discuss the strengths and weaknesses of two-part MDL, after which we introduce the modern concept of Refined, or, one-part MDL.

In the **second part** of the tutorial we will **showcase examples** of how we can solve modern data mining problems using MDL. Amongst others we will discuss tasks such as *pattern-based modelling, anomaly detection, causal inference, and model selection in deep learning*.

In the **third part** we will focus on more advanced recent insights in MDL. In particular, we will disucss the concept of **stochastic complexity**, the **normalized maximum likelihood** (NML) codelength, and *how to compute it efficiently* for a wide range of model classes, and *how to apply them to latent variable model selection*.

In the **fourth part** we will **showcase examples** of how modern MDL insights allow us to *solve data mining problems in dynamic settings*. This includes computing change statistics, dynamic model selection, failure detection, etc.

MDLDM will be a **half-day** tutorial on **Saturday, August 4**, at KDD 2019.

The slides and references will be made available by mid July.

Jilles Vreeken

CISPA Helmholtz Center for Information Security Kenji Yamanishi

The University of Tokyo

CISPA Helmholtz Center for Information Security Kenji Yamanishi

The University of Tokyo