GPU Performance Comparison for Large Language Models

Gábor Bíró • January 11, 2025

2 min read

The rapid development of Large Language Models (LLMs) poses new challenges in the field of computing. A crucial question for me is how GPUs perform when running these models. In this post, I aim to examine the performance of various GPUs through the concepts of TFLOPS (trillion floating-point operations per second) and TOPS (trillion operations per second). I will present the capabilities of individual models using a clear table, supplemented with brief explanations.

GPU Performance Comparison for Large Language Models

TOPS (Tera Operations Per Second) and FLOPS (Floating Point Operations Per Second) are two important metrics for characterizing GPU performance, but they relate to different types of computational operations, especially when running and training LLMs (Large Language Models).

TOPS (Tera Operations Per Second)

TOPS generally measures the performance of integer operations (INT8, INT16, INT32, etc.).
It is typically used for AI accelerators (e.g., Tensor Cores, NPUs, TPUs) because LLM inference (output generation, prediction) often employs fixed-point operations, which are more efficient than floating-point calculations.
For inference, INT8 or INT4 operations are used because they reduce computational and memory requirements without significantly degrading model performance. Therefore, the advertised performance of AI accelerators is often specified in TOPS.
Example: A GPU might have a performance of 200 TOPS for INT8 operations, meaning it can perform 200 trillion integer operations per second.

FLOPS (Floating Point Operations Per Second)

FLOPS measures the execution speed of floating-point operations (FP16, FP32, FP64).
It is crucial for LLM training because large models require FP16 or FP32 precision for accurate weight and gradient calculations.
Example: A modern GPU might have 20 TFLOPS (TeraFLOPS) FP32 performance, meaning it can perform 20 trillion floating-point operations per second.
For very large models (e.g., GPT-4 or Gemini), FP16 (half-precision floating-point numbers) and bfloat16 (BF16) operations are also used because they are faster while still being sufficiently accurate for training.

GPU	Tensor/AI Cores	FP32 (TFLOPS)	FP16 (TFLOPS)	BF16 (TFLOPS)	INT8 (TOPS)	VRAM (GB)	Mem. Bandwidth (GB/s)	Power Consumption (W)
NVIDIA H200 SXM	528	67	1,979	1,979	3,958	141 (HBM3e)	4,800	600-700
NVIDIA H100 SXM	576	67	1,979	1,979	3,958	80 (HBM3)	3,350	350-700
NVIDIA H100 PCIe	576	51	1,513	1,513	3,026	80 (HBM3)	2,000	350-700
NVIDIA A100 PCIe	432	19.5	312	312	624	80 (HBM2e)	1,935	250-400
RTX 6000 ADA	568	91.1				48 (GDDR6 ECC)	960	300
NVIDIA L40s	568	91.6				48 (GDDR6 ECC)	864	350
RTX A6000	336	38.7				48 (GDDR6)	768	250
NVIDIA RTX 5090	680	104.8	450		900	32 (GDDR7x)	1,790	575
NVIDIA RTX 4090	512	82.6	330		660	24 (GDDR6x)	1,008	450
NVIDIA RTX 3090	328	40	285			24	936	350
NVIDIA RTX 2080	544	14.2	108			11	616	260
AMD MI300X		61	654?	1,307	2,615	192 (HBM3)	5,200	750

Recommended

Boston Dynamics' Atlas Robot Does Push-ups

August 24, 2024 • 2 min read

The Hyundai-owned company recently released a video showing its electric Atlas robot performing push-ups. This demonstration not only showcases the robot's physical capabilities but also highlights potential future applications for humanoid robots.

Cerebras IPO: Nvidia Competitor Goes Public

October 15, 2024 • 4 min read

In recent years, the AI revolution has introduced new players and exciting technological solutions to the semiconductor industry. Among the most promising is Cerebras Systems, a California-based startup that recently announced its intention to go public.

Hiroshi Ishiguro - The Man Who Made a Copy of Himself

August 31, 2024 • 3 min read

The development of human-like robots has yielded impressive results in recent years, but it continues to raise numerous questions. Robotics researchers, including Hiroshi Ishiguro, are working to integrate robots more deeply into our daily lives, assisting with various tasks such as elder care, patient monitoring, or even performing household chores.

Humanoid Robot in Mass Production

August 21, 2024 • 3 min read

Unitree Robotics has introduced the mass-producible version of its G1 humanoid robot, which, with its price tag of approximately $16,000, opens up a market segment previously inaccessible to many. The G1 robot offers exciting opportunities not only for researchers and businesses but also for robotics enthusiasts.

The Uncanny Valley: When Robots Become Too Human

March 11, 2025 • 8 min read

Have you ever seen a robot, an animated figure, or even a video game character that was so lifelike it felt almost... unsettling? Did you struggle to tell if it was human or not, and did this uncertainty create a strange, unnerving feeling? If so, you've likely experienced the phenomenon known as the "uncanny valley." But what exactly is it, and why does it trigger such a strong reaction in us?

Deepseek V3: Near State-of-the-Art Quality on Your Own Server

January 9, 2025 • 4 min read

Until recently, the high-end AI landscape was dominated by closed-source models like GPT-4 and Claude Sonnet. Accessing these often involves significant costs and limitations. However, the arrival of DeepSeek-V3 marks a potential shift: this open-source language model not only offers performance competitive with top proprietary models but also provides the option to run it on one's own infrastructure.

1000 Fully Autonomous Robotaxis Operating in Wuhan

October 17, 2024 • 3 min read

Self-driving vehicles are revolutionizing urban transport worldwide, and China's central metropolis, Wuhan, is at the forefront of this technological race. The city has an ambitious goal to become the world's first fully driverless city, and this endeavor is already yielding impressive results.