The advent of deep learning and shallow thinking

Atmadip

20-04-2024

The following essay is just my individual perspective and disgust regarding the nowadays almost feverish hype with AI and Deep Learning…


If you’ve been following the mainstream media for a while, you might’ve surely noticed their sickly obsession with showing news articles related to AI, LLMs, Deep Learning, etc. ‘xyz LLM achieves above benchmark performance in this,’ ‘CEO of XYZ says AI will replace all jobs in 6 months’, etc.

Perhaps, these claims might be valid (and therefore scary), but the idea of this blog is not to validate/refute those claims, but to highlight some essential ideas and look at all this madness from a bigger picture.

I am a student and a fan of math, ML, economics, and more. When I first encountered ML, it was in my 1st year as an undergrad, and it was mesmerizing. Looking at data and finding hidden patterns seemed really interesting at first, and perhaps it was one of the things I believed held a lot of potential. Then came the idea of Deep Learning, and the moment I created my first neural network, I knew this was crazy. The delta learning rule has won my heart. With this, I also started to get exposed to the media coverage of this new technology and believed that since it is such a new science, I need to be updated with anything and everything that comes up. Following this, I started to read up books and blindly followed these AI gurus and what they said, believing Neural Nets are the solution to my life problems. Time passed and I started to use more and more neural networks in everything I could find, and for problems (real) I couldn’t solve, ChatGPT was there.

But in all of this madness and batshit I followed, believing I have the key to a technology that can change the world. There was one thing that I regretfully never did: slow thinking.

Science is an interesting subject. The primary purpose of science is to understand the world better. Perhaps, I believed Neural Networks were the answer to a new age of science where I now have the head start I desperately needed to succeed.

Indeed, this might seem ridiculous to you, but nowadays, I hear people bragging about working with 1 billion parameters LLM, or how (smugly) complain about having to train a model for 1 million epochs, etc.

In the end, I have noticed that we as human species have got so much into this idea of neural nets and predicting shit out some other shit that we lost the underlying principle of why ML was even invented in its early days.
It was never about giving a computer some inputs and taking their outputs at face values, assuming the so-called ‘black-box’ model is perfect in all respects, but it was about understanding the system better. Gain more insights about the system from finding patterns in data and generate human-level interpretations out of it. Interpreting the results were more important than the results themselves.

But looking at today’s world (especially the young generation), this does not seem to be the driving philosophy behind it. It seems we have lost this fundamental thought process somewhere and got more focused on fine tuning our models knowing not really why it was performing worse/better the previous time.

And with the advent of deep neural networks, this has anything but slowed this phenomenon. Explainability has become an important idea right now, but with the pace at which AI is moving forward, this is but a small hurdle to cross.

Who cares about the explainability of the model if it is giving desirable results?

Imagine ChatGPT, that application’s very existence and popularity is a testament to the above statement. Do we even care about how ChatGPT works or how their very lauded 175B parameters individually act? Do we even care about that?
Do not misinterpret me here, I am as equally enthusiastic about this new advent of AI and ML to solve a lot of different problems. But I believe, the solution is not to falsely put faith in the AI overlords and the models they create, believing that the world problems are much less complicated than the benchmarks they have beaten, but to think deeply about these systems, read about the history and the very reason on why such models came into existence at the first place. This gives us a very vivid picture of what assumptions and number crunching go into these models and, hence, how their performance stats are none but simple jackshit.

A scientist cares about the understanding of the world, but in industry, only results matter. That’s why we see such bullshit hype and excitement about such systems. If you can predict the market right with the so-and-so model, you don’t care about its explainability as long as you are making money.

Perhaps you are too busy counting your hard-earned dollars.

Remember, the fundamental idea I want to convey is that these deep networks might be great at understanding and predicting some phenomenon, but it is our, as the creator’s job to interpret the model’s finding and store it in our wealth of experience. And sadly, this is the difficult part, we have so much fixated our focus on the results part that understanding the system, and explainability now seem old-fashion issues.
And the new generation is expected to move on with such systems without really needing to interpret what the fuck is actually going on inside the model they so dearly believe as the Oracle.

For someone who is unaware of today’s AI shit, it must be a surprise for him/her to see ChatGPT working so well and might, perhaps, even think that we humans finally understand how natural languages work cause, after all, we have created ChatGPT and it is so good at processing Natural Language queries?

A confused user and a black box model

I find it quite funny that we as humans are discarding the old fundamental approach of doing science and embracing the new world where the model is supposed to predict the results with all the understanding happening inside the model, elusive to the creators themselves. I consider this as nothing but a tragedy of humanity.

The beauty of science is never about finding results, but about understanding the world a bit better and perhaps, the governing rules of this planet. But the way we are moving towards this results-oriented world of data science, understanding is given lower priority than predicting the nearest outcome.

And it pains me to see such a foolhardy population walking around smugly because they had a better model performance than most people in a domain where they don’t have any expertise. After all, if you have couldn’t interpret or explain how and why the model learned such patterns, what contribution have you given to humanity while pursuing this research?

That’s all for today, I guess…Thank you for reading this (probably) boring and (maybe) immature essay of mine. Even though I wanted to say a lot more, keeping the reader’s time in mind…I’ll wrap it up.

Thank you.