Deepseek V3: Near State-of-the-Art Quality on Your Own Server

Gábor Bíró • January 9, 2025

4 min read

Until recently, the high-end AI landscape was dominated by closed-source models like GPT-4 and Claude Sonnet. Accessing these often involves significant costs and limitations. However, the arrival of DeepSeek-V3 marks a potential shift: this open-source language model not only offers performance competitive with top proprietary models but also provides the option to run it on one's own infrastructure.

Deepseek V3: Near State-of-the-Art Quality on Your Own Server

Deepseek is a Chinese artificial intelligence company making significant advancements in the field of large language models. The company holds a particularly interesting position among AI developers as it also creates open-source models.

DeepSeek-V3 is an advanced artificial intelligence (AI) model developed by the DeepSeek company. This system belongs to the latest generation of language models and can be applied in numerous areas, such as natural language processing, data analysis, and even creative content generation. DeepSeek-V3 aims to provide users with efficient and accurate responses while continuously learning and adapting to changing needs.

Key Features

Architecture and Efficiency
- DeepSeek-V3 employs a Mixture-of-Experts (MoE) architecture containing 671 billion parameters, but only 37 billion parameters are active during any given task. This efficiency technique reduces computational requirements while maintaining high performance.
  - Multi-Head Latent Attention (MLA): Improves context understanding by compressing key-value representations.
  - Auxiliary-Loss-Free Load Balancing: Ensures efficient load balancing without performance degradation.
  - Multi-Token Prediction (MTP): Allows simultaneous prediction of multiple tokens, increasing inference speed by 1.8 times.
Cost-Effectiveness
- Training the model on 14.8 trillion tokens took only 55 days at a cost of $5.58 million. This is significantly lower than competitors like GPT-4, which required over $100 million.
  - FP8 Mixed Precision Training: By default, DeepSeek-V3 utilizes FP8 mixed-precision quantization, specifically developed to optimize the model's efficiency and accuracy. This quantization strategy aims for a balance between performance and memory usage while minimizing accuracy loss. Alongside the FP8 format, specific formats like E5M6 are used for certain sensitive operations (e.g., attention layers) to further enhance precision. For maximum accuracy, DeepSeek-V3 can also operate without quantization (e.g., using FP16 or BF16), although this significantly increases memory requirements.
  - Optimized Training Frameworks: Utilizes pipeline parallelization and fine-grained quantization techniques.
Open-Source Access
- DeepSeek-V3 is fully open-source and available on platforms like GitHub. This allows smaller companies and researchers to leverage cutting-edge technology without facing prohibitive costs.

Performance and Competitors

DeepSeek-V3 performs exceptionally well across numerous benchmarks:

Mathematics and Programming: It surpasses both open and closed models on tasks like MATH-500 and LiveCodeBench.
Language and Logic Capabilities: It competes effectively with models like GPT-4o and Claude 3.5 Sonnet, excelling particularly in Chinese language tasks.
Speed: It can process up to 60 tokens per second, which is three times faster than its predecessor, DeepSeek-V2.

Business Impacts

Democratization of AI: DeepSeek-V3 offers cost-effective, high-quality AI capabilities to smaller organizations.
Competitive Pricing: Its API pricing ($0.28 per million tokens) undercuts closed models, intensifying competition in the AI market.
Regulatory Alignment: The model complies with Chinese regulatory requirements while demonstrating global competitiveness.

Pros and Cons

Pros

High-Level Language Understanding: DeepSeek-V3 can interpret complex linguistic structures, enabling it to provide detailed and context-aware answers. This is exceptionally useful for scientific, technical, or even literary questions.
Adaptive Learning: The model continuously evolves and can adapt to new information, trends, and user feedback. This means it can provide increasingly accurate and relevant answers over time.
Multilingual Support: DeepSeek-V3 can communicate in numerous languages, enabling global use. This is particularly valuable for international projects or multilingual content creation.
Speed and Efficiency: The model features optimized algorithms, allowing for fast response times and low resource consumption. This results in excellent performance even when processing large amounts of data.
Creativity and Flexibility: DeepSeek-V3 is capable not only of providing fact-based information but also of generating creative content, such as stories, poems, or even code.

Cons

Limited Contextual Memory: Although DeepSeek-V3 can track context, during long conversations, it may occasionally lose track or not always remember earlier details. This limitation is a common issue with current AI models.
Ethical Concerns: Like any advanced AI model, DeepSeek-V3 might convey false or biased information if its training data contains errors or biases. Therefore, critical thinking and information verification by users are important.
Energy Consumption: Running DeepSeek-V3 requires significant computational resources, leading to high energy consumption. This can pose an environmental challenge.

This is how Deepseek V3 describes "itself":

"DeepSeek-V3 is an impressive artificial intelligence model poised to revolutionize information processing and creative work across numerous fields. Its advantages include high-level language understanding, adaptive learning, and multilingual support. However, attention must be paid to its limited contextual memory and ethical concerns. DeepSeek-V3 is not just a tool but a continuously evolving intelligent system that could become a cornerstone of future technology."

Recommended

Hiroshi Ishiguro - The Man Who Made a Copy of Himself

August 31, 2024 • 3 min read

The development of human-like robots has yielded impressive results in recent years, but it continues to raise numerous questions. Robotics researchers, including Hiroshi Ishiguro, are working to integrate robots more deeply into our daily lives, assisting with various tasks such as elder care, patient monitoring, or even performing household chores.

Catch-22: The Paradox That Traps Rational Thought

September 3, 2025 • 5 min read

There exists a unique kind of logical trap where the solution to a problem is blocked by the very conditions required to solve it. This inescapable, self-referential dilemma is known as a "Catch-22," a term that escaped the pages of Joseph Heller's 1961 novel to become one of the most apt metaphors for modern life. It has since grown far beyond its literary roots to become a universal symbol for the absurdities of bureaucracy, power, and everyday existence.

Hydrogen Fuel Cells Target Broader Applications

January 25, 2024 • 2 min read

General Motors and Honda have announced that their joint venture, Fuel Cell System Manufacturing, has begun producing hydrogen fuel cells in Brownstown, Michigan. The two automakers have previously collaborated on battery electric vehicles.

AI in the Aisles: Kroger's Dynamic Pricing and Its Implications

August 14, 2024 • 3 min read

Kroger's latest AI-powered dynamic pricing system has sparked mixed reactions, particularly due to concerns surrounding data privacy and inequality. How does this impact customer trust, and what ethical questions does the new technology raise?

The TRIZ Innovation Method - For Technical and Other Problems

August 22, 2024 • 3 min read

The TRIZ method, whose full name is „Teorija Resenija Izobretatelszkih Zadacs” (Theory of Inventive Problem Solving), is a systematic innovation methodology that helps find creative and effective solutions to technical and other problems. TRIZ was developed starting in the late 1940s by Genrich Altshuller, a Russian engineer who formulated the method based on the analysis of thousands of patents.

From Search to Answers: How the Largest Search Engine is Reshaping the Entire Internet

July 23, 2025 • 6 min read

The introduction of Google's AI Overviews marks a turning point in the evolution of the internet, catalyzing a paradigm shift from a referral-based web to an answer-centric ecosystem. This transformation, driven by generative artificial intelligence, is fundamentally changing the long-standing symbiotic relationship between search engines, content creators, and users.

Table Tennis Playing Robot

August 12, 2024 • 2 min read

Even a table tennis match is no longer a challenge for Google DeepMind's new robot! AI is proving its ability to handle complex tasks requiring rapid decisions in more and more fields.