Models | AGIArena

Foundation Model Leaderboard (LLM7B-Class)

#	Model	Developer	LLMB Score	Key Strengths	Released
1	Prometheus-3	Aether AI (Open-Source)	98.57	Algorithmic Efficiency, Low-Power Inference	Nov 2025	Details ▸
2	GPT-4.5-Turbo	OpenAI	96.80	Advanced Reasoning, Code Generation	Sep 2025	Details ▸
3	Claude 3.5 Opus	Anthropic	95.20	Constitutional Guardrails, Context Length	Oct 2025	Details ▸
4	Gemini 2.0 Pro	Google DeepMind	94.55	Multimodality, Search Integration	Aug 2025	Details ▸
5	Llama 4	Meta AI	93.10	Open Weights, Community Support	Jul 2025	Details ▸

Analysis & Insights

The Algorithmic Efficiency Arms Race: Why LLMB-Scores are Decoupling From Parameter Count

For years, size was the primary metric of a model's power. Our latest data shows that this era is over. New, smaller, more efficient models like Prometheus-3 are achieving top scores, signaling a paradigm shift that could upend the entire compute market.

Our Methodology: How We Measure Intelligence

The AGIArena LLM Benchmark (LLMB) is not a single test. It is a weighted, composite score aggregated in real-time from a decentralized network of trusted oracles. Our system continuously runs a battery of over 30 industry-standard and proprietary tests, measuring capabilities from multi-step reasoning and coding to ethical alignment and creative instruction-following. The score reflects a model's holistic performance, normalized against a baseline set on Jan 1, 2024. This provides a robust, bias-resistant metric for the true velocity of AGI progress.

Learn more about our indices ▸

The State of Models

Foundation Model Leaderboard (LLM7B-Class)

Analysis & Insights

The Algorithmic Efficiency Arms Race: Why LLMB-Scores are Decoupling From Parameter Count

Our Methodology: How We Measure Intelligence