See How the Top LLMs Stack Up
LLM Leaderboard
Knowing the best LLMs is key to building the best AI applications, and evaluating them is a daunting task.
MMLU Benchmarks
Comprehensive testing across 57 subjects including mathematics, history, law, and medicine to evaluate LLM knowledge breadth.
HumanEval+
Extended version of HumanEval with more complex programming challenges across multiple languages to test code quality.
GPQA Evaluation
Graduate-level expert knowledge evaluation designed to test advanced reasoning in specialized domains.
MT-Bench Analysis
Multi-turn benchmarking that evaluates conversation abilities, reasoning, and instruction following across complex dialogues.
SWE Benchmarks
Software engineering tests including code generation, debugging, and algorithm design to measure programming capabilities.
GSM8K Reasoning
Grade school math word problems requiring multi-step reasoning to evaluate logical thinking and problem-solving capabilities.
Top LLM Models by MMLU Score
The top LLM for reasoning and problem solving, with a focus on grade school math word problems.
Fastest LLM Models by Throughput
The fastest LLMs ranked by tokens processed per second, measuring raw processing speed and efficiency.
Most Cost-Effective LLM Models
The most affordable LLMs ranked by cost per token, helping you optimize your budget without compromising quality.
Longest Context Window
Maximum number of tokens a model can process in a single input
Observe how different processing speeds affect real-time token generation.
1200
t/s
The quick brown fox jumps over the lazy dog. Meanwhile, a clever rabbit watches from nearby bushes, intrigued by the scene unfolding before its eyes. The fox continues its playful pursuit, demonstrating remarkable agility and grace in motion. As the sun sets on the horizon, the forest comes alive with the sounds of nature, creating a symphony of rustling leaves and gentle breezes. The fox pauses, alert to these changes, its ears perked up to catch every subtle noise in the surroundings.
200
t/s
The quick brown fox jumps over the lazy dog. Meanwhile, a clever rabbit watches from nearby bushes, intrigued by the scene unfolding before its eyes. The fox continues its playful pursuit, demonstrating remarkable agility and grace in motion. As the sun sets on the horizon, the forest comes alive with the sounds of nature, creating a symphony of rustling leaves and gentle breezes. The fox pauses, alert to these changes, its ears perked up to catch every subtle noise in the surroundings.
40
t/s
The quick brown fox jumps over the lazy dog. Meanwhile, a clever rabbit watches from nearby bushes, intrigued by the scene unfolding before its eyes. The fox continues its playful pursuit, demonstrating remarkable agility and grace in motion. As the sun sets on the horizon, the forest comes alive with the sounds of nature, creating a symphony of rustling leaves and gentle breezes. The fox pauses, alert to these changes, its ears perked up to catch every subtle noise in the surroundings.
Values reset every 5 seconds to demonstrate different speeds
Compare any two LLM models side by side across different metrics, including MMLU, GPQA, HumanEval, DROP, Context Size, Parameters, Input Price, Output Price, Inference Speed, Throughput, and Latency.
Metric
Provider
MMLU Score
GPQA Score
Context Size
Parameters
Input Price
Throughput
Latency
Claude 3.5 Haiku
Anthropic
63.4%
40.8%
200,000
N/A
0.8
49.093
0.689
Claude 3.7 Sonnet
Anthropic
80.3%
65.6%
200,000
N/A
3
N/A
N/A
