Why you cant just skip Deep Learning.

An interesting post: Dear Aspiring Data Scientists, Just Skip Deep Learning (For Now).

Hardly a day goes by that I don't hear the same sentiment expressed in one way or another. The problem is that they are correct and wrong at the same time.

The problem is terminology. Let's use the terminology defined in What's the difference between data science, machine learning, and artificial intelligence?

So far as you are looking at Data Science and Machine Learning (and your focus is to be hired) that is insight and prediction, then the article is valid. If your goal is a prediction, then why not use the simplest method, it is easier to train and generalize. Furthermore, they are correct in that training for image recognition, voice processing, computer vision... you need a massive amount of data and processing power. This is not where most DS/ML jobs are.

The problem they are missing, again going back to the definition above, is prediction vs action. So long as the goal of DS/ML is to gather insight and prediction for humans then arguments are valid. It is when you want to take an action that it all falls apart. Humans can potentially apply their domain knowledge and navigate the predictions to find the optimum action.

But for machines, it is very different. You basically have three choices to determine the best action, apply human-derived rules to the predictions (1980s AI), reduce the problem to an optimization program (linear or convex), or essentially use reinforcement learning to derive a policy to deal with the uncertainty of your prediction. This is I think the essence of Software 2.0 or what I like to call Training Driven Development (TrDD) -- More on this later.

If rules and/or optimization works then great you are done. But when that is not an option, then in the model prescribed by the article, you need to combine a policy neural network with your ML prediction. The problem now is that you have two islands to deal with, your object loss function's gradient from neural network can't propagate to your ML prediction. Simplifications that worked so well for ML prediction for humans now are being amplified as errors in your policy network. I don't know how you can have a loss function that can train the policy and communicate with the say a linear regression model's loss function.

After reading Optimizing things in the USSR I ordered my Red Plenty book. It has some interesting observation as to what happens when you "simplify assumptions".