About | API Access | Data & Research
Terminal Request | Login | Support

Foundation Model Leaderboard (LLM7B-Class)

# Model Developer LLMB Score Key Strengths Released
1 Prometheus-3 Aether AI (Open-Source) 98.57 Algorithmic Efficiency, Low-Power Inference Nov 2025 Details ▸
2 GPT-4.5-Turbo OpenAI 96.80 Advanced Reasoning, Code Generation Sep 2025 Details ▸
3 Claude 3.5 Opus Anthropic 95.20 Constitutional Guardrails, Context Length Oct 2025 Details ▸
4 Gemini 2.0 Pro Google DeepMind 94.55 Multimodality, Search Integration Aug 2025 Details ▸
5 Llama 4 Meta AI 93.10 Open Weights, Community Support Jul 2025 Details ▸

Analysis & Insights

Our Methodology: How We Measure Intelligence

The AGIArena LLM Benchmark (LLMB) is not a single test. It is a weighted, composite score aggregated in real-time from a decentralized network of trusted oracles. Our system continuously runs a battery of over 30 industry-standard and proprietary tests, measuring capabilities from multi-step reasoning and coding to ethical alignment and creative instruction-following. The score reflects a model's holistic performance, normalized against a baseline set on Jan 1, 2024. This provides a robust, bias-resistant metric for the true velocity of AGI progress.

Learn more about our indices ▸