In the explosion of interest for deep learning, conventional statistical methods that have been in use for much longer seem to attract very little interest. Comically, the word “statistics” is almost at the opposite end of the excitement spectrum, evoking memories of dull college classes rather than exhilarating tech projects and silicon valley success. How distant are deep learning and statistics? More importantly, is the enormous interest for the one and the coolness to the other actually warranted? This is a vast and quickly developing field, so in what follows, we must caveat that the comparison will only extend itself to the more classical methods of deep learning, and excludes newer hybrid methods where it is combined with statistics.
Very roughly speaking, a key objective of statistical modelling is using a mathematical framework to formulate a relationship between variables. Equipped with a data set, one determines with what degree of certainty it can be said that a variable can be inferred from others. Based on such relationships, we can then make future predictions based on demonstrated relationships between variables, conditional on certain assumptions. If the model is sufficiently well designed (to the extent that this is possible), then we make future predictions by first answering why the variable behaves in a certain way. One of the objectives of statistical modelling is interpretability, or in other words an attempt to explain the inner workings of a process.
Deep learning, in its classic forms, follows a very different approach. The mathematical concept behind “learning” is actually optimization. We represent a variable as being the output of a multi-layered set of relationships, similar to neurons in a brain (hence the term “neural network”). We feed a data set into this set of relationships, and adjust the settings of the layers for the best fit to the training data set. If we take another data set to test out how well the neural network works on the specified task, for example image recognition, and end up with a reasonably high level of precision, then the neural network can be used in a production setting. What is crucial to understand is that we didn’t ask why the neural network performs well, it just does. This is what is referred to as the “black box”.
The interesting question is to understand why deep learning causes so much more excitement than statistical modelling, and to what extent this is justified by business realities. Artificial intelligence has been around since the 1950s, and since this period there has already been a cycle of social excitement and disillusionment, and we are now experiencing a new peak of enthusiasm. Clearly, the availability of adequate computing power, especially through distributed computing power, as well as gigantic amounts of data generated by online activity, make a much more compelling case this time around. Unlike statistics, machine learning generally starts to work well with enormous amounts of data, which then need to be processed through the model, which requires a lot of computing power. So to an extent, the excitement makes sense because it is still unclear what the potential applications of deep learning will be. The appearance of such massive amounts of data and processing power are very recent, and as a result of the hype there is much more research into new methods and approaches. Statistics has not experienced such a surge in potential or research.
One of the distinctive features of machine learning is that it is empirical: we aren’t preoccupied with why it works, but if it works well we are happy to use it. Such an approach isn’t universally acceptable. When a central bank sets monetary policy, the public expects that it does so based on understanding of the macroeconomic processes at work, such as underlying relationships between employment, trade, and investment. This is an instance of where it is important to answer the question why, and hence central banks work with statistical modelling. It is ludicrous to imagine a central bank training a neural network on historical data sets, and then having it spit out interest rates, with no other justification but that the model performed well in out of sample testing! On the other hand, if we consider something like image recognition on Facebook, then a lack of understanding of why the deep learning algorithm works is completely acceptable. Consider a student that uploads a graduation photo with 70 fellow students to Facebook, and would like to have them tagged so that their friends and family can also see the picture and like or comment. If the algorithm successfully recognized 68 students, with 2 missed and 1 mislabeled, then the user is clearly satisfied, as the additional manual work is minimal, and without the algorithm the task would have been unrealistic . The fact that the underlying explanation of why the algorithm worked is of no interest to the user, and of no interest to Facebook as long as the success rates are reliably high. The same logic applies to a technology like Amazon’s Alexa: if Alexa detects the request for information that a user has submitted and correctly finds an answer, they are happy, and if not they can just open their computer and search themselves.
Above are a few examples that illustrate another reason why deep learning generates excitement: there is a “something for nothing” factor. In specific cases, models can be built to solve complex tasks automatically. This is conditional on precision being high enough, and the stakes low, or in other words that errors aren’t costly. If these are verified, then onerous manual tasks can be done by a computer at zero cost. Although the hype around machine learning can be excessive, and its potential uses greatly exaggerated, for specific tasks there can indeed be real automation benefits. The same facial recognition can be used for many things, such as remote ID verification done by providers like Jumio. It is likely that some of the promoted applications of deep learning, like self-driving cars, will not live up to their high expectations. However, as in the previous cycles of artificial intelligence, and much more this time, useful and established applications will gradually emerge.