Attention Is Intelligence: What AI Gets Right (and what it never will)

The last time humanity asked AI the ultimate question, it took the death of the universe to get an answer.

In The Last Question, Asimov’s Multivac is asked over and over:
“Can entropy be reversed?”

Each time, the machine responds:
“INSUFFICIENT DATA FOR A MEANINGFUL ANSWER.”

Only after everything: stars, galaxies, and time itself collapsed into nothing does Multivac figure it out. And what does it say?
“LET THERE BE LIGHT.”

For decades, we’ve imagined AI as something like this: an advanced machine that eventually outthinks us all. But in reality? Most AI we laymen use today is predicting the next word in a sentence.

Don’t get me wrong; I am not saying LLMs are useless, far from it. They might not be decoding the fabric of the universe, but they are doing something weirdly compelling.

They don’t think or reason, and they sure as hell don’t understand like we do. But they do something that makes them feel brilliant.

It all comes down to attention.

Attention makes these models work; it’s how they decide what matters in a sentence, what words to weigh more heavily, and how they pull off responses that almost make you forget they’re just fancy statistical engines.

This week, we will explore what happens inside these language models, what we keep getting wrong about them, and why attention is the foundation of modern AI.

3 (of many) AI Myths That Need a Reality Check

We like to think we understand AI. We use it, talk about it, and watch the headlines; some of us feel we are experts and must have an opinion about every AI news. All good, not hating, lol.

– We assume chatting with ChatGPT is the same as prompting. Nope.

– We assume LLMs think. They don’t.

– We assume AI is just text generation. It isn’t.

If we don’t challenge these assumptions, we will never understand what AI can and cannot do.

  1. Chatting ≠ Full-On Prompting

Sure, we all type into ChatGPT, Claude, or Gemini and call that “prompting.” But that’s just a part of the equation. In the background, a hidden “System Prompt” sets the ground rules for how these models respond, and your typed text becomes one element of the complete prompt.

If you want to see what full-blown “prompting” looks like, try building an AI agent specifically instructed to retrieve data, reason, and act in a structured way. That’s when the real artistry of prompt engineering kicks in. Chatting in plain text is perfectly valid, but it’s not what prompting really is.

How easy is it to build an AI Agent? Well, a dummy one, not so hard. You can follow this free course about how to build an AI Agent from Hugging Face. You also get access to their libraries, and they instruct you on creating a dummy one. The course also covers LLMs at large, prompts, AI Agents tools, and much more. Has exams and certifications, too. I cannot recommend it enough. You can find it here (PS: Good luck with the app.py file 😭)

  1. LLMs Don’t Think. They Predict.

Large Language Models don’t understand or reason quite like humans. They’re REALLY GOOD guessers who figure out the next word based on patterns they’ve seen in billions of text samples.
So if you write in a new chat with your fav LLM, “Stockholm is the capital of…” that sentence usually ends in “Sweden,” and that’s the word you’ll get back from the model. Don’t believe me? Let’s test it.

Here, I tried this in Gemini. And Gemini didn’t even bother to think I might mean Stockholm is the capital of delicious Kebab. Because the highest probability here is “Sweden.” This is called greedy decoding.

Let’s call Gemini out.

But Gemini sucks in general, so let’s try my best friend, ChatGPT, and see how I trapped it twice 🙂

Also, I taught GPT to cut the crap when replying 😀
The most statistically-like answer. I will explain that below.
Greedy decoding again for my AI friend.

I didn’t forget about Paris. Let’s see how I trapped it again 😭

Here’s how it really works:

  • Every input you type is tokenized (split into smaller pieces).
  • Each token is assigned a probability score -> what should come next?
  • The model selects the most probable sequence, not based on truth but on statistical likelihood. (wet moment for CRO folks 🤣)

LLMs hallucinate because the truth is irrelevant to them. Statistical coherence is all that matters.

Let that sink in for a second.

We’re wired to believe that words carry meaning, and that language is a way to express thoughts, ideas, and, most importantly, truth. But for an LLM, words are just math. A game of probabilities.

It doesn’t ‘know’ what it’s saying and doesn’t care if it’s right. It picks the most statistically likely sequence. And that’s where it gets uncomfortable.

We rely on language to build trust. To communicate facts. To navigate reality. However, none of that applies when you interact with an LLM. You’re not talking to something that values truth. You’re talking to something that only values what sounds plausible.

Yet, we continue to treat these models like reasoning machines, like they are thinking. But can something really think if the concept of truth does not exist for it?

While some researchers argue that when these models become massive enough, there might be glimmers of emergent reasoning, the standard view is that they lack human-like logic or knowledge.

If you want to explore this more, here are some excellent resources.

Emergent Reasoning in Large Language Models:

  • Emergent Abilities of Large Language Models (2022): This paper discusses how scaling up language models can lead to the sudden appearance of new capabilities, such as performing arithmetic or summarizing passages, which were not present in smaller models.
    arxiv.org
  • Emergent Analogical Reasoning in Large Language Models (2023): Researchers found that models like GPT-3 exhibit a surprising capacity for abstract pattern recognition, solving analogy problems at a level comparable to or surpassing human performance in certain settings.
    nature.com

Hallucinations in Large Language Models:

  • A Survey on Hallucination in Large Language Models (2023): This comprehensive survey introduces a taxonomy of hallucinations in LLMs, explores contributing factors, and discusses challenges unique to LLMs compared to earlier task-specific models.
    arxiv.org
  • Detecting Hallucinations in Large Language Models Using Semantic Consistency (2024): This study proposes statistical methods, including entropy-based uncertainty estimators, to detect hallucinations in LLM outputs, aiming to enhance the reliability of AI-generated content.
    nature.com
  • Hallucination is Inevitable: An Innate Limitation of Large Language Models (2024): The authors argue that hallucinations are an inherent limitation of LLMs, suggesting that while mitigation is possible, complete elimination may not be achievable.
    arxiv.org
  1. AI = Just LLMs (we say AI, but we think just LLMs)

Most people interact with AI through chatbots or chat interfaces, using them as brainstorming tools or to automate small tasks. Thus, it’s easy to think that’s all that AI is. Here is a definition below.

From my article: Introduction to Natural Language Processing and its Business Applications

Some of the most revolutionary AI breakthroughs have nothing to do with predicting words (language models):

– AlphaFold solves protein structures, helping scientists develop new medicines faster than ever. (Read more)

– Boston Dynamics Robots do backflips and handle real-world logistics, proving AI is WAY MORE THAN text. (Check this out)

– Stable Diffusion & MidJourney can generate breathtaking images out of thin air, shifting how we think about creativity. (Try it free)

Yet, despite our misconceptions and the truth about how they work, these models still perform frighteningly well.

So, what’s making them so powerful? It all comes down to attention.

Beyond the Mean

The real breakthrough isn’t more data. It isn’t bigger models. It’s attention.

Before self-attention, AI relied on recurrent neural networks (RNNs) and convolutional neural networks (CNNs), both of which struggled with long-range dependencies. They processed words sequentially or in fixed windows, making it hard to capture relationships across a complete sentence, let alone entire conversations.

In 2017, Attention Is All You Need (Vaswani et al.) changed everything. The paper introduced self-attention, allowing models to simultaneously process all words in a sentence and weigh their relationships dynamically. This meant an LLM no longer had to read the text in order; it could assess every word relative to every other word at once. Context became fluid, making models vastly more coherent, scalable, and robust.

This is why LLMs feel smarter, not because they understand the language but because they can allocate focus where it matters.

But just like human attention, AI’s attention is limited.

What Even Is Attention?

Imagine you’re at MeasureCamp, surrounded by conversations, background laughter, and constant noise. Yet, the moment someone across the room (maybe Steen Rasmussen?) says your name, your brain zeroes in on it.

That’s selective attention in action, filtering out noise, prioritizing what’s relevant, and locking in key information.

A neural network is a machine learning program or model that makes decisions like the human brain would. It uses processes that mimic how biological neurons work together to identify phenomena, weigh options, and arrive at conclusions.

Neural networks do something similar. Instead of treating all words as equally important, an LLM assigns weight to each token based on its relevance to the context. This is why it can maintain long-range coherence, remembering details from several messages ago, or infer meaning by emphasizing critical words in a sentence.

But just like in a noisy MeasureCamp common area, attention can only stretch so far.

The more words you feed an LLM, the more its focus becomes dispersed. Overload it, and it struggles to prioritize effectively. This is why dumping a massive block of text and expecting ten high-performing LinkedIn posts in return often results in generic output. The model doesn’t “miss” details because it lacks intelligence—it’s simply not designed to focus on everything equally. Especially on shitty Linkedin posts 😂

This also exposes the flaws in the “ultimate AI prompt guides” circulating on social media, which promise perfect responses with the right magic formula. LLMs don’t work that way. Their power lies in how they allocate attention, not in brute-force instructions. Understanding that distinction changes the use of AI as a tool and the expectation that it will perform like a human mind.

How Human & AI Attention Are Similar (And Completely Different)

Attention makes both humans and AI function, but not in the same way.

For us, attention is limited by cognitive load. We process the world through experience, memory, and emotion, filtering out what matters and discarding the rest. We can choose to ignore distractions, shift focus, and assign meaning to what we see.

For AI, attention is limited by compute power. Instead of experience, it relies on training data and token weights to determine importance. It can’t ignore things as humans can; it can only de-prioritize them. And most importantly, it doesn’t seek meaning. It optimizes statistical relationships.

That’s why AI is unbeatable at pattern recognition but fails at common sense. It can scan millions of medical images in minutes but can’t explain a joke.

AI doesn’t assign meaning. It just optimizes attention.

And if attention is the foundation of intelligence—human or artificial—then the real challenge isn’t making AI smarter. It’s teaching humans how to focus in a world drowning in information.

We keep asking if AI will change the world.

Maybe the real question is: Can we keep up?

PS: I want to give a shout to Jason Packer, who wrote an article two years ago, to which I contributed a bit, exploring the outsider role in the AI world. Give it a read.

This blog article is a part of my newsletter – Beyond the Mean.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top