Artificial Analysis

Item: Artificial Analysis
Rating: 4.1
Author: Plumb

Independent (Artificial Analysis, Inc.; seed-backed by AI Grant / Nat Friedman & Daniel Gross)

Benchmark Free to read Visit Artificial Analysis ↗

Standardized, unbought AI benchmarks; a useful filter, not a final verdict.

What it's really for An independent AI benchmark; the public leaderboards lead into paid enterprise reports.

What our grade covers The grade on this page is about its LLM intelligence, speed, and price benchmarks, not everything the site does.

High Scoring Confidence Checked against primary sources. We are confident in the facts and the grade here.

Follow the money

The same AI labs and inference providers it ranks are also its paying customers for private custom benchmarking and enterprise reports, but the founders state explicitly that "no one pays to be on the website" and "you can't pay us for better results," so paying does not buy public placement.

Source →

Operating since: 2023 (3 years) · source
What it costs you: Free to read The reviews are free to read.
How they make money: It makes money from paid enterprise "insights" report subscriptions and private custom benchmarking commissioned by AI-stack companies, while the public leaderboards remain free.
What they do: It independently benchmarks and ranks LLMs and inference API providers on intelligence (a composite of ~10 eval datasets), speed/latency, and live API price, publishing the results as free public leaderboards.
What to watch for: You get standardized lab-style benchmark scores, not a guarantee they match your real-world use case, and high-scoring models can still be the ones a given vendor optimized ("benchmark-gamed") for, so treat the index as a starting filter rather than a verdict.
Composite score: 4.10 / 5.00 → grade A-
Last verified: July 11, 2026 — when we last checked this entry's facts, links, and grade against the live site.
Site confirmed live: July 11, 2026 — our monthly automated check reached Artificial Analysis and got a normal response.

How the grade was reached

Independence · 30% weight 4 / 5

Does the site take money from the very entities it ranks? Pay-for-placement, vendor-funded data, and affiliate commissions all pull this down. The less the ranking can be bought, the higher the score.

Evidence basis · 30% weight 5 / 5

What is the ranking actually built on? Hands-on testing scores highest, then verified first-hand reviews, then opinion or popularity surveys and self-reported figures, then pay-to-rank, which scores lowest.

Method transparency · 20% weight 4 / 5

Is the methodology published, specific, and reproducible? Can a reader see how a given rank was reached, or is it a black box?

Conflict disclosure · 10% weight 2 / 5

Are commercial relationships, sponsorships, and affiliate arrangements disclosed clearly and near the rankings themselves, rather than buried?

Manipulation resistance · 10% weight 4 / 5

How hard is it to game? Controls against fake reviews, solicited reviews, and vendor gaming raise this; an open box anyone can stuff lowers it.

Evidence

Built as a side project in 2023 (while co-founder Micah Hill-Smith was building a legal AI assistant) and launched publicly in January 2024 by Micah Hill-Smith (CEO) and George Cameron (CPO); seed-funded via Nat Friedman and Daniel Gross's AI Grant. Founders: 'You can't pay us for better results... there's no use doing what we do unless it's independent,' and 'no one pays to be on the website.' Revenue comes from an enterprise insights/report subscription and private custom benchmarking. Source: Latent Space interview with co-founders George Cameron and Micah Hill-Smith →
Methodology pages show in-house, hands-on evaluation: 'We maintain internal copies of all evaluation datasets' plus their own harnesses and code-execution infrastructure. Prompt templates, answer-extraction patterns, scoring criteria and dataset citations are published for most evals, making it largely (though not perfectly) reproducible. However, the pages carry minimal conflict-of-interest disclosure: no statement of funding, investors, or commercial relationships with the model developers being ranked appears near the rankings. Source: Artificial Analysis Intelligence Benchmarking Methodology →
Public product is a free leaderboard comparing 100+ AI models from OpenAI, Google, DeepSeek and others across intelligence, speed, latency and live API price; the Intelligence Index is a composite normalized 0-100 score aggregating multiple challenging evaluation datasets, with all leaderboard data freely accessible without an account. Source: Artificial Analysis LLM Leaderboard →

Compare with others

Others reviewing ai models (compare all →)

A+ EvalPlus Leaderboard A+ Stanford HELM A SWE-bench A- The Verge A- Vals AI A- Digital Trends B+ Engadget B TechRadar C+ LMArena C+ Vellum LLM Leaderboard C- There's An AI For That (TAAFT) C- Product Hunt D+ Futurepedia D Toolify AI A+ Hugging Face Open LLM Leaderboard B- Papers with Code

← Back to the Report Card