AI-powered reasoning fashions are taking the world by storm in 2025! With the launch of DeepSeek-R1 and o3-mini, we’ve seen unprecedented ranges of logical reasoning capabilities in AI chatbots. On this article, we are going to entry these fashions through their APIs and consider their logical reasoning abilities to search out out if o3-mini can substitute DeepSeek-R1. We will probably be evaluating their efficiency on customary benchmarks in addition to real-world purposes like fixing logical puzzles and even constructing a Tetris recreation! So buckle up and be a part of the trip.
DeepSeek-R1 vs o3-mini: Logical Reasoning Benchmarks
DeepSeek-R1 and o3-mini supply distinctive approaches to structured considering and deduction, making them apt for varied sorts of advanced problem-solving duties. Earlier than we communicate of their benchmark efficiency, let’s first have a sneak peek on the structure of those fashions.
o3-mini is OpenAI’s most superior reasoning mannequin. It makes use of a dense transformer structure, processing every token with all mannequin parameters for sturdy efficiency however excessive useful resource consumption. In distinction, DeepSeek’s most reasonable mannequin, R1, employs a Combination-of-Specialists (MoE) framework, activating solely a subset of parameters per enter for better effectivity. This makes DeepSeek-R1 extra scalable and computationally optimized whereas sustaining strong efficiency.
Be taught Extra: Is OpenAI’s o3-mini Higher Than DeepSeek-R1?
Now what we have to see is how properly these fashions carry out in logical reasoning duties. First, let’s take a look at their efficiency within the livebench benchmark assessments.

Sources: livebench.ai
The benchmark outcomes present that OpenAI’s o3-mini outperforms DeepSeek-R1 in virtually all facets, aside from math. With a world common rating of 73.94 in comparison with DeepSeek’s 71.38, the o3-mini demonstrates barely stronger total efficiency. It significantly excels in reasoning, attaining 89.58 versus DeepSeek’s 83.17, reflecting superior analytical and problem-solving capabilities.
Additionally Learn: Google Gemini 2.0 Professional vs DeepSeek-R1: Who Does Coding Higher?
DeepSeek-R1 vs o3-mini: API Pricing Comparability
Since we’re testing these fashions via their APIs, let’s see how a lot these fashions price.
Mannequin | Context size | Enter Worth | Cached Enter Worth | Output Worth |
o3-mini | 200k | $1.10/M tokens | $0.55/M tokens | $4.40/M tokens |
deepseek-chat | 64k | $0.27/M tokens | $0.07/M tokens | $1.10/M tokens |
deepseek-reasoner | 64k | $0.55/M tokens | $0.14/M tokens | $2.19/M tokens |
As seen within the desk, OpenAI’s o3-mini is sort of twice as costly as DeepSeek R1 by way of API prices. It prices $1.10 per million tokens for enter and $4.40 for output, whereas DeepSeek R1 affords a less expensive fee of $0.55 per million tokens for enter and $2.19 for output, making it a extra budget-friendly choice for large-scale purposes.
Sources: DeepSeek-R1 | o3-mini
Entry DeepSeek-R1 and o3-mini through API
Earlier than we step into the hands-on efficiency comparability, let’s discover ways to entry DeepSeek-R1 and o3-mini utilizing APIs.
All you must do for this, is import the mandatory libraries and api keys:
from openai import OpenAI
from IPython.show import show, Markdown
import time
with open("path_of_api_key") as file:
openai_api_key = file.learn().strip()
with open("path_of_api_key") as file:
deepseek_api = file.learn().strip()
DeepSeek-R1 vs o3-mini: Logical Reasoning Comparability
Now that we’ve gotten the API entry, let’s examine DeepSeek-R1 and o3-mini primarily based on their logical reasoning capabilities. For this, we are going to give the identical immediate to each the fashions and consider their responses primarily based on these metrics:
- Time taken by the mannequin to generate the response,
- High quality of the generated response, and
- Price incurred to generate the response.
We are going to then rating the fashions 0 or 1 for every activity, relying on their efficiency. So let’s check out the duties and see who emerges because the winner within the DeepSeek-R1 vs o3-mini reasoning battle!
Job 1: Constructing a Tetris Recreation
This activity requires the mannequin to implement a totally useful Tetris recreation utilizing Python, effectively managing recreation logic, piece motion, collision detection, and rendering with out counting on exterior recreation engines.
Immediate: “Write a python code for this drawback: generate a Python code for the Tetris recreation“
Enter to DeepSeek-R1 API
INPUT_COST_CACHE_HIT = 0.14 / 1_000_000 # $0.14 per 1M tokens
INPUT_COST_CACHE_MISS = 0.55 / 1_000_000 # $0.55 per 1M tokens
OUTPUT_COST = 2.19 / 1_000_000 # $2.19 per 1M tokens
# Begin timing
task1_start_time = time.time()
# Initialize OpenAI consumer for DeepSeek API
consumer = OpenAI(api_key=api_key, base_url="https://api.deepseek.com")
messages = [
The options are:
A 335
B 129
C 431
D 100
Please mention your approch that you have taken at each step
"""
,
The options are:
A 335
B 129
C 431
D 100
Please mention your approch that you have taken at each step
"""
]
# Get token depend utilizing tiktoken (alter mannequin identify if mandatory)
encoding = tiktoken.get_encoding("cl100k_base") # Use a suitable tokenizer
input_tokens = sum(len(encoding.encode(msg["content"])) for msg in messages)
# Name DeepSeek API
response = consumer.chat.completions.create(
mannequin="deepseek-reasoner",
messages=messages,
stream=False
)
# Get output token depend
output_tokens = len(encoding.encode(response.decisions[0].message.content material))
task1_end_time = time.time()
total_time_taken = task1_end_time - task1_start_time
# Assume cache miss for worst-case pricing (alter if cache data is accessible)
input_cost = (input_tokens / 1_000_000) * INPUT_COST_CACHE_MISS
output_cost = (output_tokens / 1_000_000) * OUTPUT_COST
total_cost = input_cost + output_cost
# Print outcomes
print("Response:", response.decisions[0].message.content material)
print("------------------ Whole Time Taken for Job 1: ------------------", total_time_taken)
print(f"Enter Tokens: 25 , Output Tokens: _____")
print(f"Estimated Price: $ 9 ")
# Show end result
from IPython.show import Markdown
show(Markdown(response.decisions[0].message.content material))
Response by DeepSeek-R1

You will discover DeepSeek-R1’s full response right here.
Output token price:
Enter Tokens: 28 | Output Tokens: 3323 | Estimated Price: $0.0073
Code Output
Enter to o3-mini API
task1_start_time = time.time()
consumer = OpenAI(api_key=api_key)
messages = messages=[
____,
11
]
# Use a suitable encoding (cl100k_base is the best choice for brand new OpenAI fashions)
encoding = tiktoken.get_encoding("cl100k_base")
# Calculate token counts
input_tokens = sum(len(encoding.encode(msg["content"])) for msg in messages)
completion = consumer.chat.completions.create(
mannequin="o3-mini-2025-01-31",
messages=messages
)
output_tokens = len(encoding.encode(completion.decisions[0].message.content material))
task1_end_time = time.time()
input_cost_per_1k = 0.0011 # Instance: $0.005 per 1,000 enter tokens
output_cost_per_1k = 0.0044 # Instance: $0.015 per 1,000 output tokens
# Calculate price
input_cost = (input_tokens / 1000) * input_cost_per_1k
output_cost = (output_tokens / 1000) * output_cost_per_1k
total_cost = input_cost + output_cost
print(completion.decisions[0].message)
print("----------------=Whole Time Taken for activity 1:----------------- ", task1_end_time - task1_start_time)
print(f"Enter Tokens: _____, Output Tokens: {output_tokens}")
print(f"Estimated Price: ${total_cost:.6f}")
# Show end result
from IPython.show import Markdown
show(Markdown(completion.decisions[0].message.content material))
Response by o3-mini

You will discover o3-mini’s full response right here.
Output token price:
Enter Tokens: 28 | Output Tokens: 3235 | Estimated Price: $0.014265
Code Output
Comparative Evaluation
On this activity, the fashions had been required to generate useful Tetris code that enables for precise gameplay. DeepSeek-R1 efficiently produced a totally working implementation, as demonstrated within the code output video. In distinction, whereas o3-mini’s code appeared well-structured, it encountered errors throughout execution. In consequence, DeepSeek-R1 outperforms o3-mini on this situation, delivering a extra dependable and playable answer.
Rating: DeepSeek-R1: 1 | o3-mini: 0
Job 2: Analyzing Relational Inequalities
This activity requires the mannequin to effectively analyze relational inequalities slightly than counting on fundamental sorting strategies.
Immediate: “ Within the following query assuming the given statements to be true, discover which of the conclusion among the many given conclusions is/are undoubtedly true after which give your solutions accordingly.
Statements:
H > F ≤ O ≤ L; F ≥ V < D
Conclusions: I. L ≥ V II. O > D
The choices are:
A. Solely I is true
B. Solely II is true
C. Each I and II are true
D. Both I or II is true
E. Neither I nor II is true.”
Enter to DeepSeek-R1 API
INPUT_COST_CACHE_HIT = 0.14 / 1_000_000 # $0.14 per 1M tokens
INPUT_COST_CACHE_MISS = 0.55 / 1_000_000 # $0.55 per 1M tokens
OUTPUT_COST = 2.19 / 1_000_000 # $2.19 per 1M tokens
# Begin timing
task2_start_time = time.time()
# Initialize OpenAI consumer for DeepSeek API
consumer = OpenAI(api_key=api_key, base_url="https://api.deepseek.com")
messages = [
{"role": "system", "content": "You are an expert in solving Reasoning Problems. Please solve the given problem."},
{"role": "user", "content": """ In the following question, assuming the given statements to be true, find which of the conclusions among given conclusions is/are definitely true and then give your answers accordingly.
Statements: H > F ≤ O ≤ L; F ≥ V < D
Conclusions:
I. L ≥ V
II. O > D
The options are:
A. Only I is true
B. Only II is true
C. Both I and II are true
D. Either I or II is true
E. Neither I nor II is true
"""}
]
# Get token depend utilizing tiktoken (alter mannequin identify if mandatory)
encoding = tiktoken.get_encoding("cl100k_base") # Use a suitable tokenizer
input_tokens = sum(len(encoding.encode(msg["content"])) for msg in messages)
# Name DeepSeek API
response = consumer.chat.completions.create(
mannequin="deepseek-reasoner",
messages=messages,
stream=False
)
# Get output token depend
output_tokens = len(encoding.encode(response.decisions[0].message.content material))
task2_end_time = time.time()
total_time_taken = task2_end_time - task2_start_time
# Assume cache miss for worst-case pricing (alter if cache data is accessible)
input_cost = (input_tokens / 1_000_000) * INPUT_COST_CACHE_MISS
output_cost = (output_tokens / 1_000_000) * OUTPUT_COST
total_cost = input_cost + output_cost
# Print outcomes
print("Response:", response.decisions[0].message.content material)
print("------------------ Whole Time Taken for Job 2: ------------------", total_time_taken)
print(f"Enter Tokens: {input_tokens}, Output Tokens: {output_tokens}")
print(f"Estimated Price: ${total_cost:.6f}")
# Show end result
from IPython.show import Markdown
show(Markdown(response.decisions[0].message.content material))
Output token price:
Enter Tokens: 136 | Output Tokens: 352 | Estimated Price: $0.000004
Response by DeepSeek-R1

Enter to o3-mini API
task2_start_time = time.time()
consumer = OpenAI(api_key=api_key)
messages = [
{
"role": "system",
"content": """You are an expert in solving Reasoning Problems. Please solve the given problem"""
},
{
"role": "user",
"content": """In the following question, assuming the given statements to be true, find which of the conclusions among given conclusions is/are definitely true and then give your answers accordingly.
Statements: H > F ≤ O ≤ L; F ≥ V < D
Conclusions:
I. L ≥ V
II. O > D
The options are:
A. Only I is true
B. Only II is true
C. Both I and II are true
D. Either I or II is true
E. Neither I nor II is true
"""
}
]
# Use a suitable encoding (cl100k_base is the best choice for brand new OpenAI fashions)
encoding = tiktoken.get_encoding("cl100k_base")
# Calculate token counts
input_tokens = sum(len(encoding.encode(msg["content"])) for msg in messages)
completion = consumer.chat.completions.create(
mannequin="o3-mini-2025-01-31",
messages=messages
)
output_tokens = len(encoding.encode(completion.decisions[0].message.content material))
task2_end_time = time.time()
input_cost_per_1k = 0.0011 # Instance: $0.005 per 1,000 enter tokens
output_cost_per_1k = 0.0044 # Instance: $0.015 per 1,000 output tokens
# Calculate price
input_cost = (input_tokens / 1000) * input_cost_per_1k
output_cost = (output_tokens / 1000) * output_cost_per_1k
total_cost = input_cost + output_cost
# Print outcomes
print(completion.decisions[0].message)
print("----------------=Whole Time Taken for activity 2:----------------- ", task2_end_time - task2_start_time)
print(f"Enter Tokens: {input_tokens}, Output Tokens: {output_tokens}")
print(f"Estimated Price: ${total_cost:.6f}")
# Show end result
from IPython.show import Markdown
show(Markdown(completion.decisions[0].message.content material))
Output token price:
Enter Tokens: 135 | Output Tokens: 423 | Estimated Price: $0.002010
Response by o3-mini

Comparative Evaluation
o3-mini delivers probably the most environment friendly answer, offering a concise but correct response in considerably much less time. It maintains readability whereas guaranteeing logical soundness, making it superb for fast reasoning duties. DeepSeek-R1, whereas equally appropriate, is far slower and extra verbose. Its detailed breakdown of logical relationships enhances explainability however might really feel extreme for simple evaluations. Although each fashions arrive on the similar conclusion, o3-mini’s pace and direct method make it the higher alternative for sensible use.
Rating: DeepSeek-R1: 0 | o3-mini: 1
Job 3: Logical Reasoning in Math
This activity challenges the mannequin to acknowledge numerical patterns, which can contain arithmetic operations, multiplication, or a mix of mathematical guidelines. As an alternative of brute-force looking out, the mannequin should undertake a structured method to infer the hidden logic effectively.
Immediate: “Examine the given matrix rigorously and choose the quantity from among the many given choices that may substitute the query mark (?) in it.
____________
| 7 | 13 | 174|
| 9 | 25 | 104|
| 11 | 30 | ? |
|_____|____|___|
The choices are:
A 335
B 129
C 431
D 100
Please point out your method that you’ve taken at every step.“
Enter to DeepSeek-R1 API
INPUT_COST_CACHE_HIT = 0.14 / 1_000_000 # $0.14 per 1M tokens
INPUT_COST_CACHE_MISS = 0.55 / 1_000_000 # $0.55 per 1M tokens
OUTPUT_COST = 2.19 / 1_000_000 # $2.19 per 1M tokens
# Begin timing
task3_start_time = time.time()
# Initialize OpenAI consumer for DeepSeek API
consumer = OpenAI(api_key=api_key, base_url="https://api.deepseek.com")
messages = [
{
"role": "system",
"content": """You are a Expert in solving Reasoning Problems. Please solve the given problem"""
},
The options are:
A 335
B 129
C 431
D 100
Please mention your approch that you have taken at each step
"""
]
# Get token depend utilizing tiktoken (alter mannequin identify if mandatory)
encoding = tiktoken.get_encoding("cl100k_base") # Use a suitable tokenizer
input_tokens = sum(len(encoding.encode(msg["content"])) for msg in messages)
# Name DeepSeek API
response = consumer.chat.completions.create(
mannequin="deepseek-reasoner",
messages=messages,
stream=False
)
# Get output token depend
output_tokens = len(encoding.encode(response.decisions[0].message.content material))
task3_end_time = time.time()
total_time_taken = task3_end_time - task3_start_time
# Assume cache miss for worst-case pricing (alter if cache data is accessible)
input_cost = (input_tokens / 1_000_000) * INPUT_COST_CACHE_MISS
output_cost = (output_tokens / 1_000_000) * OUTPUT_COST
total_cost = input_cost + output_cost
# Print outcomes
print("Response:", response.decisions[0].message.content material)
print("------------------ Whole Time Taken for Job 3: ------------------", total_time_taken)
print(f"Enter Tokens: {input_tokens}, Output Tokens: {output_tokens}")
print(f"Estimated Price: ${total_cost:.6f}")
# Show end result
from IPython.show import Markdown
show(Markdown(response.decisions[0].message.content material))
Output token price:
Enter Tokens: 134 | Output Tokens: 274 | Estimated Price: $0.000003
Response by DeepSeek-R1

Enter to o3-mini API
task3_start_time = time.time()
consumer = OpenAI(api_key=api_key)
messages = [
{
"role": "system",
"content": """You are a Expert in solving Reasoning Problems. Please solve the given problem"""
},
The options are:
A 335
B 129
C 431
D 100
Please mention your approch that you have taken at each step
"""
]
# Use a suitable encoding (cl100k_base is the best choice for brand new OpenAI fashions)
encoding = tiktoken.get_encoding("cl100k_base")
# Calculate token counts
input_tokens = sum(len(encoding.encode(msg["content"])) for msg in messages)
completion = consumer.chat.completions.create(
mannequin="o3-mini-2025-01-31",
messages=messages
)
output_tokens = len(encoding.encode(completion.decisions[0].message.content material))
task3_end_time = time.time()
input_cost_per_1k = 0.0011 # Instance: $0.005 per 1,000 enter tokens
output_cost_per_1k = 0.0044 # Instance: $0.015 per 1,000 output tokens
# Calculate price
input_cost = (input_tokens / 1000) * input_cost_per_1k
output_cost = (output_tokens / 1000) * output_cost_per_1k
total_cost = input_cost + output_cost
# Print outcomes
print(completion.decisions[0].message)
print("----------------=Whole Time Taken for activity 3:----------------- ", task3_end_time - task3_start_time)
print(f"Enter Tokens: {input_tokens}, Output Tokens: {output_tokens}")
print(f"Estimated Price: ${total_cost:.6f}")
# Show end result
from IPython.show import Markdown
show(Markdown(completion.decisions[0].message.content material))
Output token price:
Enter Tokens: 134 | Output Tokens: 736 | Estimated Price: $0.003386
Output by o3-mini




Comparative Evaluation
Right here, the sample adopted in every row is:
(1st quantity)^3−(2nd quantity)^2 = third quantity
Making use of this sample:
- Row 1: 7^3 – 13^2 = 343 – 169 = 174
- Row 2: 9^3 – 25^2 = 729 – 625 = 104
- Row 3: 11^3 – 30^2 = 1331 – 900 = 431
Therefore, the right reply is 431.
DeepSeek-R1 appropriately identifies and applies this sample, resulting in the appropriate reply. Its structured method ensures accuracy, although it takes considerably longer to compute the end result. o3-mini, however, fails to determine a constant sample. It makes an attempt a number of operations, akin to multiplication, addition, and exponentiation, however doesn’t arrive at a definitive reply. This ends in an unclear and incorrect response. General, DeepSeek-R1 outperforms o3-mini in logical reasoning and accuracy, whereas O3-mini struggles attributable to its inconsistent and ineffective method.
Rating: DeepSeek-R1: 1 | o3-mini: 0
Closing Rating: DeepSeek-R1: 2 | o3-mini: 1
Logical Reasoning Comparability Abstract
Job No. | Job Sort | Mannequin | Efficiency | Time Taken (seconds) | Price |
1 | Code Era | DeepSeek-R1 | ✅ Working Code | 606.45 | $0.0073 |
o3-mini | ❌ Non-working Code | 99.73 | $0.014265 | ||
2 | Alphabetical Reasoning | DeepSeek-R1 | ✅ Appropriate | 74.28 | $0.000004 |
o3-mini | ✅ Appropriate | 8.08 | $0.002010 | ||
3 | Mathematical Reasoning | DeepSeek-R1 | ✅ Appropriate | 450.53 | $0.000003 |
o3-mini | ❌ Flawed Reply | 12.37 | $0.003386 |
Conclusion
As we’ve seen on this comparability, each DeepSeek-R1 and o3-mini exhibit distinctive strengths catering to completely different wants. DeepSeek-R1 excels in accuracy-driven duties, significantly in mathematical reasoning and complicated code technology, making it a powerful candidate for purposes requiring logical depth and correctness. Nevertheless, one important downside is its slower response instances, partly attributable to ongoing server upkeep points which have affected its accessibility. However, o3-mini affords considerably quicker response instances, however its tendency to provide incorrect outcomes limits its reliability for high-stakes reasoning duties.
This evaluation underscores the trade-offs between pace and accuracy in language fashions. Whereas o3-mini could also be helpful for speedy, low-risk purposes, DeepSeek-R1 stands out because the superior alternative for reasoning-intensive duties, supplied its latency points are addressed. As AI fashions proceed to evolve, putting a steadiness between efficiency effectivity and correctness will probably be key to optimizing AI-driven workflows throughout varied domains.
Additionally Learn: Can OpenAI’s o3-mini Beat Claude Sonnet 3.5 in Coding?
Often Requested Questions
A. DeepSeek-R1 excels in mathematical reasoning and complicated code technology, making it superb for purposes that require logical depth and accuracy. o3-mini, however, is considerably quicker however typically sacrifices accuracy, resulting in occasional incorrect outputs.
A. DeepSeek-R1 is the higher alternative for coding and reasoning-intensive duties attributable to its superior accuracy and talent to deal with advanced logic. Whereas o3-mini supplies faster responses, it could generate errors, making it much less dependable for high-stakes programming duties.
A. o3-mini is greatest suited to low-risk, speed-dependent purposes, akin to chatbots, informal textual content technology, and interactive AI experiences. Nevertheless, for duties requiring excessive accuracy, DeepSeek-R1 is the popular choice.
A. DeepSeek-R1 has superior logical reasoning and problem-solving capabilities, making it a powerful alternative for mathematical computations, programming help, and scientific queries. o3-mini supplies fast however typically inconsistent responses in advanced problem-solving situations.