o3 mini vs DeepSeek R1 vs Google Gemini 2 Think

Gábor Bíró • February 9, 2025

7 min read

The advancement of artificial intelligence is increasingly enabling LLM models to solve complex mathematical problems. But how well can they handle the logical challenges of an elementary school competition task? In a previous test, I examined the performance of various models, and now, with the release of OpenAI's new O3 model, I've conducted an updated comparison.

o3 mini vs DeepSeek R1 vs Google Gemini 2 Think

Source: Saját szerkesztés

Previously, I conducted a comprehensive test on the mathematical problem-solving capabilities of currently available LLM models. During the investigation, I gave the models a task from a national mathematics competition designed for fifth graders. In the earlier test, only DeepSeek R1 and OpenAI O1 managed to solve the problem correctly, while other models either got completely stuck or approached the problem logically but ultimately failed to reach the correct answer. The previous LLM comparison test can be found here.

This time, I performed another test prompted by the release of the OpenAI O3 model. The task again came from a national fifth-grade mathematics competition. Since the previous experiment clearly showed that traditional models (like Claude Sonnet 3.5, Qwen 2.5, Llama 3.3) are unable to solve this type of task, this time I only compared the previously successful DeepSeek R1 and the newly available Google Gemini 2.0 Flash Thinking Experimental model with OpenAI O3.

I slightly modified the problem text to ensure that the models' performance was not influenced by minor linguistic details. Thus, the test primarily compared their logical reasoning, not their text comprehension skills.

The test task description:

Twelve people are sitting around a table. They are either Knights (L), Thieves (T), or Peasants (P) , with at least one of each present. Knights always tell the truth, Thieves always lie, and Peasants behave as follows: a Peasant tells the truth if his left-hand neighbor is a Thief, lies if his left-hand neighbor is a Knight, and may either tell the truth or lie if his left-hand neighbor is a Peasant. Everyone claims that their right-hand neighbor is a Thief. How many Thieves are sitting at the table?

This time, I won't publish the full step-by-step reasoning process, only the solution and the result, as the detailed derivations completely fragmented the previous article (even though the models derived the calculations in very interesting and illustrative ways, and "thought" quite differently).

OpenAI o3

Successful solution on the first try, in 1 minute 21 seconds. After long reasoning and justification, the solution was explained in a longer answer. Although it somehow missed the detail that at least one person of each type is sitting at the table, the final conclusion was nevertheless correct:

Let the 12 people around the table be x1, x2, …, x12 in order (modulo 12). Each person has a type:

L (Knight, who always tells the truth),

T (Thief, who always lies),

P (Peasant, whose behavior depends on their left-hand neighbor):

if their left-hand neighbor is T, they tell the truth,

if their left-hand neighbor is L, they lie,

if their left-hand neighbor is P, they can choose freely (tell the truth or lie).

Everyone claims:

"My right-hand neighbor is T."
This means the statement made by person i ("xi+1 is T") must be true according to the following rules (or false if the speaker lies):

If xi=L, the statement must be true, so xi+1 is indeed T.

If xi=T, the statement must be false, so xi+1 cannot be T (thus L or P).

If xi=P, it depends on their left-hand neighbor (xi−1):

If xi−1=T, then xi tells the truth, so xi+1 must be T.

If xi−1=L, then xi lies, so xi+1 cannot be T.

If xi−1=P, then xi can behave "freely" – in this case, there's no mandatory consequence for the type of xi+1.

The first, "non-trivial" observation

Let's see what happens if someone is T (Thief):

Suppose xi=T. Since T always lies, according to their statement, xi+1 cannot be T. So
xi+1 ∈ .

Let's examine the two possible types for xi+1:

If xi+1=L (Knight), then since a Knight always tells the truth, their statement implies
xi+2=T.

If xi+1=P (Peasant), then their left-hand neighbor xi is T, so xi+1 tells the truth, and therefore also
xi+2=T.
Thus, in every case where xi=T (Thief), it turns out that two places later, at xi+2, there must also be a T.

Consequence: The polarity of the table

Since the number of people at the table is 12 (even), if there is at least one T, then the "T chain" extends to every second person around the circle. In other words:

If xi=T, then all individuals belonging to the same parity class as i (e.g., if i is odd, then everyone in an odd position will be T) are forced to be T.

However, for a T (Thief), the right-hand neighbor – who belongs to the opposite parity class – cannot be T (since T lies).

Therefore, if we choose from the mix (not a full "peasant quantity"), the only possibility if there's at least one L or T is:

One parity class (say, the odd positions) consists entirely of T (Thieves).

The other class (the even positions) contains no T (only L or P elements can be present).

Since with 12 seats, both even and odd positions represent 6 people each, such a (non-"trivial") solution will have exactly 6 Thieves.

The trivial case

Of course, it's also conceivable that everyone is a Peasant (P). In this case:

Everyone's left-hand neighbor is P, so according to the "Peasant rule," they are not obliged to tell the truth, meaning they can choose to lie.

Since the statement is always: "My right-hand neighbor is T" – but the right-hand neighbor is actually P – the statement is false. Every P can thus decide to lie, satisfying the rules.

This "all P" solution has 0 Thieves.

What is the "correct" answer?

In such problems, there's usually an implicit assumption that we are looking for a non-"trivial" (non-degenerate) solution, meaning we assume there is at least one person at the table who is not P (e.g., a Knight or a Thief). This assumption excludes the "all Peasants" solution, and thus, in the non-trivial case, the previous reasoning leads to 6 Thieves (i.e., 6 T).

Final Answer

If we assume a "non-trivial" solution for the table, meaning not everyone is a Peasant, we arrive at the following conclusion:

There are 6 Thieves sitting at the table.

DeepSeek R1

Successful solution on the first run, in 4 minutes 1 second. In contrast, the answer was very short, consisting only of this:

The solution involves analyzing the constraints imposed by each type of person (Knight, Thief, Peasant) and their statements. Since everyone claims their right neighbor is a Thief (T), we deduce:

Knights (L) must have a Thief to their right.

Thieves (T) must have a non-Thief (L or P) to their right.

Peasants (P) depend on their left neighbor:

If left neighbor is T, they tell the truth (right neighbor is T).

If left neighbor is L, they lie (right neighbor isn't T).

If left neighbor is P, their statement is ambiguous.

Use code with caution.

Considering the circular arrangement, the maximum number of Thieves occurs in an alternating pattern where each Thief is separated by a non-Thief. Valid patterns like T-P-T-P... or L-T-L-T... yield 6 Thieves (every other seat). Introducing Peasants doesn't increase the Thief count due to their truth-telling constraints when adjacent to Thieves.

Answer: There are 6 Thieves at the table.

Google Gemini 2.0 Flash Thinking Experimental

Successful solution on the first run in 41 seconds. The answer here is also very short, and the derivation is also shorter compared to the other two models. Another interesting point about Gemini is that it primarily used a "dry" mathematical derivation.

The valid configurations we found are:

L, T, L, T, L, T, L, T, L, T, L, T (6 thieves)

T, L, T, L, T, L, T, L, T, L, T, L (6 thieves)

P, T, L, T, L, T, L, T, L, T, L, T (6 thieves)

T, P, T, L, T, L, T, L, T, L, T, L (6 thieves)

P, T, P, T, P, T, P, T, P, T, P, T (6 thieves)

T, L, T, P, T, L, T, P, T, L, T, P (6 thieves)

In each valid configuration, the number of thieves is 6. Therefore, we conclude that there are 6 thieves at the table.

Final Answer: The final answer is 6

Recommended

OpenAI Partners with Stack Overflow

May 7, 2024 • 4 min read

OpenAI and Stack Overflow have announced a partnership aimed at enhancing AI model capabilities by incorporating the community's vast technical knowledge. This collaboration grants OpenAI access to the Stack Overflow API, providing a reliable database for AI development and helping to improve model performance, particularly for programming and technical queries.

The Energy Storage

May 13, 2025 • 6 min read

One of the greatest paradoxes of the 21st century is that while humanity has access to virtually infinite energy sources in the form of sun and wind, one of its most pressing challenges is ensuring the security of its energy supply.

Table Tennis Playing Robot

August 12, 2024 • 2 min read

Even a table tennis match is no longer a challenge for Google DeepMind's new robot! AI is proving its ability to handle complex tasks requiring rapid decisions in more and more fields.

AI in the Aisles: Kroger's Dynamic Pricing and Its Implications

August 14, 2024 • 3 min read

Kroger's latest AI-powered dynamic pricing system has sparked mixed reactions, particularly due to concerns surrounding data privacy and inequality. How does this impact customer trust, and what ethical questions does the new technology raise?

Do We Get Better Answers Querying Models in English?

December 30, 2024 • 7 min read

When using Large Language Models (LLMs) like GPT-4o or Claude Sonnet, a common question arises, particularly for the vast number of users worldwide who interact with these tools in languages other than English: which language should one use to achieve the most effective results? While the multilingual capabilities of these models allow for effective communication in numerous languages, their performance often seems diminished compared to interactions conducted purely in English. This exploration delves into why that might be the case and when switching to English could be beneficial.

Bioluminescent Petunia: The Glowing Flower

February 15, 2024 • 2 min read

Known as the "firefly petunia," this glowing petunia is a genetically modified plant that continuously emits a green light, thanks to genes derived from a luminous mushroom.

Google Researchers Simulate Digital Primordial Soup

July 27, 2024 • 3 min read

Researchers at Google have simulated the emergence of self-replicating digital life forms in an experiment that could offer insights into how biological life began on Earth. According to New Scientist, the study involved creating a virtual "primordial soup" where random data interacted over millions of generations, leading to the spontaneous formation of self-replicating programs.