加载中...
一眼看懂模型能力位置——MMLU / HumanEval / MATH / GPQA / SWE-bench / MMMU / Arena
| # | 模型 | 类型 | 上下文 | 输入 $/M | 输出 $/M | MMLU | HumanEval | MATH | GPQA | SWE-b | MMMU | Arena |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 🟠 Claude Opus 4.6 Thinking Anthropic闭源 | Reasoning | 300K | $15.00 | $75.00 | 93.5 | 94.7 | 91.2 | 82.6 | 80.8 | 81.8 | 1504 |
| 2 | 🟠 Claude Opus 4.7 Anthropic闭源 | General | 300K | $5.00 | $25.00 | 94.0 | 95.3 | 86.5 | 78.4 | 87.6 | 82.1 | 1498 |
| 3 | 🔵 Gemini 3.1 Pro Preview Google DeepMind闭源 | Multimodal | 3M | $2.00 | $12.00 | 93.1 | 93.2 | 92.5 | 80.9 | 80.6 | 85.3 | 1493 |
| 4 | ❌ Grok 4.20 Beta xAI闭源 | Reasoning | 512K | $4.00 | $12.00 | 90.8 | 90.3 | 93.7 | 78.5 | 66.4 | — | 1491 |
| 5 | 🟠 Claude Opus 4.6 Anthropic闭源 | General | 300K | $15.00 | $75.00 | 92.8 | 93.9 | 84.0 | 74.2 | 80.8 | 80.3 | 1489 |
| 6 | 🟢 GPT-5.4 High OpenAI闭源 | Reasoning | 400K | $2.50 | $15.00 | 93.6 | 95.1 | 97.8 | 85.1 | 80.2 | 83.0 | 1484 |
| 7 | 🟢 GPT-5.4 OpenAI闭源 | General | 400K | $2.50 | $15.00 | 92.3 | 93.8 | 89.4 | 74.5 | 72.3 | 81.7 | 1472 |
| 8 | 🔵 Gemini 3 Pro Google DeepMind闭源 | Multimodal | 2M | $2.00 | $10.00 | 92.4 | 92.0 | 91.2 | 79.0 | 76.5 | 84.1 | 1471 |
| 9 | 🟢 GPT-5.2 OpenAI闭源 | General | 400K | $2.00 | $10.00 | 91.0 | 92.5 | 85.3 | 70.8 | 80.0 | 80.4 | 1452 |
| 10 | 🟠 Claude Sonnet 4.6 Anthropic闭源 | General | 200K | $3.00 | $15.00 | 91.5 | 93.4 | 80.1 | 68.9 | 73.2 | 76.8 | 1451 |
| 11 | ❌ Grok 4 xAI闭源 | Reasoning | 256K | $5.00 | $15.00 | 89.5 | 88.7 | 88.2 | 74.3 | 46.1 | — | 1420 |
| 12 | 🟠 Claude Sonnet 4.5 Anthropic闭源 | General | 200K | $3.00 | $15.00 | 91.2 | 93.7 | 78.0 | 66.3 | 66.1 | 75.0 | 1410 |
| 13 | 🐳 DeepSeek V3.2 DeepSeek开源 | General | 128K | $0.27 | $1.10 | 89.2 | 90.8 | 86.3 | 65.4 | 73.0 | 68.2 | 1388 |
| 14 | 🅰️ Qwen3 Max 阿里通义闭源 | Multimodal | 128K | $1.20 | $4.80 | 88.3 | 89.1 | 86.7 | 68.4 | 63.5 | 74.2 | 1362 |
| 15 | 🔵 Gemini 2.5 Flash Google DeepMind闭源 | Multimodal | 1M | $0.10 | $0.40 | 83.4 | 84.2 | 82.7 | 62.4 | — | 72.5 | 1335 |
| 16 | 🌙 Kimi K2.5 Moonshot闭源 | General | 2M | $0.15 | $2.50 | 86.4 | 89.2 | 80.7 | 60.8 | 52.3 | — | 1320 |
| 17 | 🟠 Claude Haiku 4.5 Anthropic闭源 | General | 200K | $0.80 | $4.00 | 85.6 | 86.3 | 66.8 | 52.1 | 55.4 | — | — |
| 18 | 🟢 GPT-5.3 Codex OpenAI闭源 | Reasoning | 400K | $3.00 | $12.00 | 90.8 | 96.4 | 88.6 | 71.2 | 85.0 | 75.9 | — |
| 19 | 🟢 GPT-5 Mini OpenAI闭源 | General | 400K | $0.30 | $1.20 | 82.5 | 85.4 | 74.6 | 55.8 | 48.2 | — | — |
| 20 | 🟢 OpenAI o3 OpenAI闭源 | Reasoning | 200K | $10.00 | $40.00 | 91.3 | 93.6 | 96.3 | 83.3 | 71.7 | — | — |
| 21 | 🐳 DeepSeek R2 DeepSeek开源 | Reasoning | 128K | $0.55 | $2.20 | 92.1 | 93.4 | 98.1 | 78.2 | 68.9 | — | — |
| 22 | 🅰️ Qwen3-Coder-Next 阿里通义开源 | General | 256K | — | — | — | 91.6 | — | — | 70.6 | — | — |
| 23 | 🧬 GLM-4.6 智谱开源 | General | 200K | $0.60 | $2.20 | 85.3 | 86.5 | 82.1 | 61.2 | 54.2 | 70.3 | — |
| 24 | 🎬 MiniMax M2.5 MiniMax闭源 | General | 256K | $1.00 | $4.00 | 87.2 | 88.6 | 82.3 | 63.9 | 80.2 | 73.5 | — |
| 25 | ♾️ Llama 4 Maverick Meta开源 | Multimodal | 10M | — | — | 87.2 | 86.9 | 80.3 | 62.1 | 51.6 | 74.5 | — |
| 26 | ♾️ Llama 4 Behemoth Meta开源 | General | 10M | — | — | 90.3 | 91.5 | 87.2 | 70.8 | 68.4 | 78.2 | — |
| 27 | 🇫🇷 Mistral Large 3 Mistral闭源 | General | 256K | $2.00 | $6.00 | 86.3 | 89.0 | 75.8 | 58.2 | 48.1 | — | — |
| 28 | 🟣 Command R+ 2026 Cohere闭源 | General | 128K | $2.50 | $10.00 | 78.2 | 76.4 | 58.3 | 49.6 | — | — | — |
| 29 | 🪟 Phi-5 Microsoft开源 | General | 32K | — | — | 87.6 | 86.1 | 84.2 | 62.3 | — | — | — |
Chatbot Arena 人类偏好 Elo(LMArena)
真实 GitHub Issue 修复(Verified)
数学竞赛题 MATH-500
研究生级推理 GPQA Diamond
数据来源:LMArena · SWE-bench Verified · 各家官方技术报告 · Artificial Analysis · 最后更新 2026-04-22。 跑分会每月手工核对一次,不同发布版本(Thinking / High / Codex)差异较大,请以官方为准。