Why Classic Machine Learning vs. Deep Learning is a Trumpian vs. Pelosian Argument
Politics is the topic that we love to hate. It’s everybody’s dirty little secret that we enjoy a good scandal. Don’t fret though, this isn’t about politics, it’s about Machine Learning — though I suppose it is also a little bit about politics.
Regardless of what side of the political fence you sit (or some of us who work hard to be Swiss and walk the middle line), each political party has its iconic figurehead that has a subjective list of good things and bad things. You love to see “the other side” take a few punches or your team to be vindicated of some crazy exaggeration.
The fact that people can rarely agree on whether specific outcomes are good or bad is precisely at the heart of this discussion of classic versus “new” schools of thought regarding Machine Learning.
When Do Predictions Actually Mean Something
We love to make predictions. The weather, the stock market, the elections, whether grandma will pick matching socks at Thanksgiving. Predictions based on a bad premise are still predictions, just not very good ones. We don’t always use good, complete, or even logical data to make our own predictions. In fact, we use emotions more often than we like to admit, and that means we are adding our own bias (or *gasp* prejudice) to our decisions.
“What you bring” to the table when making your models has a significant effect on whether your predictions mean something. In fact, it’s quite easy to manipulate data so that it can look like anything you want, and sometimes find relationships in data that are not naturally occurring. (Take the news coverage on COVID, for example) Do your predictions mean something to all of your users? Or are they biased toward one of your target demographics? Good data scientists tell good stories about the data, and those stories tend to nudge the recipients toward a perspective of the facts that support your agenda. (That sounds a little political…)
There are plenty of products that help you “use” machine learning methods for predictions without requiring the “user” to understand how they work. For end-users, that should be ok, assuming the designers have brought a suitable level of discipline to the table and a) really understand the problem they are solving, and b) understand the results they are using. That isn’t always the case, though, and that’s where people get into trouble.
Predictions using Classic Machine Learning Models
We are talking linear and logistic regression, random forests, clustering, PCA, and the like. These algorithms are terse. They often require classical book smarts, maths, and statistics to understand. Many of them are also older than we like to admit. Can we even call these types of algorithms classic? Many still feel shiny and new, research continues to refine them, and they are the backbone of what we have come to know about core machine learning.
I’m going to call these classic methods the “conservative” party of ML algorithms. Why? Well, why not. I’m going to compare the two dominant approaches, so we will start with the old, comfortable and familiar. The “old” (er?) way. The conservative way.
Funnily enough, some classic machine learning can look downright stupid (random forests?), yet science has backed it substantially and it is a solid workhorse. “Ensemble” models are the evolutionary practical way to force classic ML to “do more.” I’ll see your Support Vector Machines, and raise you a Random Forest. Boosting and weighting are all tricks we use to guess our way to understand the maths of prediction, and provides a never-ending mash-up of conservative algorithms, but they are still the same old predictable building blocks. All of us classic ML folk better keep our skills sharp for testing models for overfitting and Receiver Operating Characteristics.
Hot Fresh New Deep Learning Modeling
Deep learning, the latest iteration of artificial neural networks, sells the sexy belief that it is modeled after human intelligence. The general idea was inspired by how the human brain was believed to work, though now we know that neural network algorithms don’t work at all like our brains.
An ML approach that doesn’t require all the math to help get results. That sounds amazing. In fact, why don’t we also give away college education for free! Now, this article isn’t intending to argue for or against positions like Bernie’s free university pitch, but my point IS about disrupting the norm. Getting more benefit for putting less into it — it’s either brilliant resource allocation, or it’s a bunch of baloney with a big bill coming due to the next generation of leaders. I’m going to call this approach the liberal side of machine learning. It’s all about embracing the new and forward-thinking. Trying new things that might end up being awesome.
Even the simplest of neural networks requires an understanding of differential calculus and gradient descent algorithms if you intend to explain how it finds its result. Ask any decent computer scientist, and their first reaction is likely to be “yes it is well known and proven how that works, but it’s not worth your time to learn all the details. Just download this library and follow this tutorial.” The truth is that neural networks and their big siblings, deep learning and GANs totally rock. They are hip, cool, and they deliver the results in terms of accuracy. Here’s a cool paper with lots of gritty details about why Deep Learning works so well. It’s not ALL good news though.
So what’s not to like? Accuracy? Good. Work with any kind of data? Pretty good. Not fast enough? Just throw more hardware at it. Not bad. Not much brain thinky to get to an acceptable result? Hooray! We found the problem.
Performance at the Expense of Understanding
Ok, we all know that we have a society of impatient people. People will look for the best deal, but many are willing to pay more to get what they want faster. It makes sense to me that everybody is hopping on the trend, because learning how to get started with deep learning is quite easy, and it helps solve some problems that are quite difficult to solve with classic machine learning, like image and speech recognition.
For some problems, deep learning may be the only reasonable approach, but its fundamental flaw comes down to truly understanding the intuition of the result and the ability to build a model that is truly generalized. It’s sort of like accepting Fox/CNN headlines as true without doing your own research.
Conservative ML algorithms teach you how to expect the answer, and you learn that with bite-sized sets of data like the Iris flowers or the Titanic dataset. Deep learning is more of a jump into the deep end of the pool, where large amounts of data can be described in terms of how it is prepared for use, but then the internals of the neural network mostly function as a black box. Of course, it *could* be explained with enough time and effort, but that argument is also a weakness.
Bad uses of data (by developers or data scientists) pump data into a model and accept the result with the highest accuracy, lowest false positive or negative numbers, or the nicest looking ROC curve. The missing piece? The “scientist” part of making a theory, then testing your theory with experiments that yield your best-predicted result. Asking good questions and learning the reasons why some of your tests didn’t result in what you expected. The majority of developers I have spoken to that are jumping into deep learning do so because it’s easy and gives them some results quickly. Developers like this often don’t even care to understand the internals or the methodologies behind the algorithms, and thus become ignorant followers of the publicly available libraries. That is dangerous and similar to reading and regurgitating headlines without any fundamental understanding of the underlying argument. Sure, free education, universal basic income, and free healthcare sound amazing, but if you don’t understand the underlying economics and account for the potential ripple effects, you may be in for a rude awakening when the other shoe drops.
Lazy is Catching On
Blind performance entices a younger generation of people into the deep learning camp. Not all the followers are sheep though. The valuable left turn for the whole industry is the emergence of AutoML. Here, we throw hardware at “guessing’ combinations of conservative and liberal ML algorithms. The tools are quite cool and useful to gain insights about what is possible, but ultimately the industry will be judged based on how it uses tools like this. Will people build and blindly accept ensembles from AutoML? Or will they use those results to inform classic scientific research?
So Which is Better?
Well, it certainly would not look good if one of the sides said that the topic shouldn’t be debated. It also doesn’t make sense for one to argue that there isn’t a need for the other.
What do you think about classic vs new ML? Is classic really the conservative side? Is deep learning really the liberal side? Is it really black and white? Politics certainly wants us to think it is black or white. If you are for Trump, then you are against Biden! If you are pro-choice, then you hate Republicans. This kind of on/off thing is crazy, just like thinking that old vs new in machine learning are the only binary choices.
Classic machine learning can look dumb, and it can be slow and ineffective, but like geopolitics, if a classic model can be applied to the problem, it is very satisfying, predictable, and most importantly, it can be reasoned through to help adjust it as time (and new data) come into the picture.
Deep learning, General Adversarial Networks, and the new wave of hypercomplex modeling certainly lets us work on new complicated problems that would be quite difficult to address using only classic ML. At the same time, it is being peddled like sugar, with many people not understanding the technology, yet relying on it. It is substantially difficult to use intuition to adjust performance without many unknown consequences.
A case could be made to say that NEITHER is better, though in-depth education of classic ML lends itself to possibly have more responsible use of both. Under the covers of “classic’ vs ‘new’ school ML, both sides will have their champions, and they often hate people who try to live in both and try to be everybody’s friend.
Which ML school of thought would you call Trumpian? Which would you call Pelosian?
Like politics, your party affiliation or support for a particular politician may be based on a combination of good and bad facts as well as feelings and bias that come from your education, your source for news, and a whole bunch of other things. Where you learned about Machine Learning, and how influential your teacher was, likely pushed you one way or the other. It’s easier than you think, and like controversial political topics, people tend to form camps and can easily adopt binary II am right and you are wrong” perspectives, utterly destroying the scientific foundation of the premise.
Andrew Schwabe is Founder and Chief Technology Officer for Formotiv, a behavioral intelligence company that helps its customers observe, understand, and predict good and bad behaviors in web applications.