A Look at the "Illusion of Thinking"

Gábor Bíró • September 6, 2025

4 min read

In the relentless hype cycle of artificial intelligence, we're often told that we're on a fast track to true Artificial General Intelligence (AGI). But what if the engines driving us there aren't as powerful as they seem? In a fascinating turn, researchers from Apple have published two papers that serve as a crucial reality check on the current state of AI.

These studies have sparked intense debate by suggesting that the impressive capabilities of our models might be more of a sophisticated illusion than genuine intelligence. Let's dive into what they found and why it matters.

GSM-Symbolic: When AI Fails at Basic Math

The first major challenge came in a 2024 paper titled "GSM-Symbolic." Led by researchers Iman Mirzadeh and Mehrdad Farajtabar, the team created a new benchmark to test how well Large Language Models (LLMs) handle mathematical reasoning. Instead of just testing if a model could get the right answer, they tested how robust its reasoning was.

The findings were revealing:

Fragile Logic: The models' performance dropped significantly when the researchers only changed the numbers in a word problem while keeping the underlying mathematical logic identical. A model that could solve for "2+3" might fail when asked to solve for "4+5" in the same story context.
Easily Distracted: When a single, seemingly relevant but ultimately useless piece of information was added to a problem, the performance of all leading AI models plummeted—in some cases by as much as 65%.
The Core Conclusion: The study strongly suggested that these models aren't performing true logical reasoning. Instead, they are engaging in highly advanced pattern matching, essentially looking for familiar problem structures from their training data to find a solution.

This was the first hint that something was amiss under the hood. But it was the follow-up study that truly shook the conversation.

"The Illusion of Thinking": AI Hits a Wall

In June 2025, a paper titled "The Illusion of Thinking," spearheaded by Parshin Shojaee and Iman Mirzadeh, took this investigation a step further. The team tested so-called "Large Reasoning Models" (LRMs)—models specifically designed for complex problem-solving—against a set of classic logic puzzles with adjustable difficulty, including:

Towers of Hanoi
The River Crossing Problem
Checkers Jumping
Blocks World

The results were nothing short of stunning.

The "Accuracy Cliff": The models performed well on simpler versions of the puzzles. But as the complexity was dialed up, their performance didn't gracefully decline; it fell off a cliff, dropping dramatically to zero accuracy.
Paradoxical Scaling: Even more bizarrely, when faced with harder problems, the models often used fewer computational steps (or "thinking tokens"). It was as if, upon recognizing a challenge beyond its capabilities, the AI simply "gave up" rather than trying harder.
Three Performance Regimes: The researchers identified three distinct zones. At low complexity, standard LLMs sometimes did better. At medium complexity, the specialized LRMs had an edge. But at high complexity, every single model failed completely.

The researchers' conclusion was blunt and powerful: these models create "the illusion of formal reasoning" but are actually performing a brittle form of pattern matching that can be broken by something as simple as changing a name in a puzzle.

The Debate and Apple's Motivation

Naturally, these findings didn't go unchallenged. The scientific community engaged in a vigorous debate. Some critics, like Alex Lawsen in a response titled "The Illusion of the Illusion of Thinking," argued that flaws in the experimental setup—such as using unsolvable versions of the River Crossing problem or token limits that forced models to quit—were to blame, not a fundamental flaw in the models themselves.

This scientific back-and-forth is healthy and necessary. But it's also worth considering the context. Apple has been playing catch-up in the AI race. While its competitors have soared on the AI boom, Apple has proceeded more cautiously. Publishing research that highlights the fundamental weaknesses of the current dominant approach could be a strategic move to reshape the narrative, arguing that a slower, more deliberate path is wiser than the current "scale is all you need" philosophy.

What This Means for the Future of AI

The implications of Apple's research are profound and force us to confront uncomfortable questions:

Is Real Reasoning Possible? Are current LLM architectures fundamentally incapable of achieving true, generalized reasoning, no matter how large they become?
The End of Scaling Laws? This research casts doubt on the prevailing "scaling law"—the idea that simply adding more data and more computing power will inevitably lead to greater intelligence.
A Call for Innovation: If current methods have a hard ceiling, then achieving AGI may require entirely new architectural innovations beyond the transformer models that power today's AI.

Apple’s research doesn't claim that AI is useless; its power as a tool is undeniable. However, it provides a sobering and evidence-based counter-narrative to the relentless hype. It suggests that the path to truly intelligent machines may not be a straight line up but may require us to go back to the drawing board.

Recommended

The Limits of Our Tribal Brain in a Modern World

June 30, 2025 • 10 min read

How many friends do you really have? The number of your Facebook connections might run into the hundreds or even thousands, but with how many people do you maintain a truly deep and meaningful relationship?

Humanoid Robot in Mass Production

August 21, 2024 • 3 min read

Unitree Robotics has introduced the mass-producible version of its G1 humanoid robot, which, with its price tag of approximately $16,000, opens up a market segment previously inaccessible to many. The G1 robot offers exciting opportunities not only for researchers and businesses but also for robotics enthusiasts.

Money, Power, and Society in the Long Waves of History

October 5, 2025 • 5 min read

In a previous analysis, we identified technological revolutions as the primary engine of the long economic waves known as Kondratiev cycles. The steam engine, railways, electricity, and the microchip were all fundamental innovations that reshaped the global economy in recurring 50-to-60-year cycles. However, this technology-centric view tells only one part of the story—albeit a spectacular one. Behind the scenes, other equally powerful forces are at play: the flow of financial capital, the shifting tides of social mood, and the realignment of global power.

From Search to Answers: How the Largest Search Engine is Reshaping the Entire Internet

July 23, 2025 • 6 min read

The introduction of Google's AI Overviews marks a turning point in the evolution of the internet, catalyzing a paradigm shift from a referral-based web to an answer-centric ecosystem. This transformation, driven by generative artificial intelligence, is fundamentally changing the long-standing symbiotic relationship between search engines, content creators, and users.

Robots That Learn on the Job: The Rise of Self-Training AI

August 12, 2024 • 5 min read

Imagine robots that don't just follow pre-programmed instructions but actually learn and adapt while performing tasks in our unpredictable world. Researchers at MIT have recently developed a novel algorithm called "Estimate, Extrapolate, and Situate" (EES), marking a significant step in this direction. This innovation promises to enhance robotics by enabling machines to train themselves effectively, reducing the need for constant human intervention and potentially revolutionizing their capabilities across numerous domains.

Apple Acquires French AI Startup Datakalab to Bolster On-Device AI

April 29, 2024 • 3 min read

In a move signaling its deepening investment in artificial intelligence, particularly for on-device processing, Apple has acquired Datakalab, a French AI startup specializing in low-power computer vision and deep learning algorithms. The acquisition, finalized in December 2023 for an undisclosed sum, was recently noted in a European Commission filing and highlights Apple's strategy ahead of expected AI feature launches, likely reinforcing its commitment to privacy-preserving AI.

Dark Factories, Warehouses

August 21, 2024 • 5 min read

For decades, the manufacturing and logistics industries have discussed the advent of fully automated factories and warehouses, where production and material handling are managed by highly advanced robots and intelligent machines with minimal human intervention. Such facilities can operate even in complete darkness, hence the term "lights-out factory."