Vals AI

Item: Vals AI
Rating: 4
Author: Plumb

Independent (Vals AI, Inc.; VC-backed, no corporate parent)

Benchmark Free to read Visit Vals AI ↗

A rare independent benchmark of AI on real legal and tax work, with the catch that vendors opt in and some pay.

What it's really for An independent AI benchmark; testing models on real expert tasks is the whole job.

What our grade covers The grade on this page is about its hands-on AI leaderboards on legal, tax, and finance tasks, not everything the site does.

High Scoring Confidence Checked against primary sources. We are confident in the facts and the grade here.

Follow the money

Vals AI is paid by AI vendors it benchmarks: in its Vals Legal AI Report it discloses "Vals AI has a customer relationship with one or more of the participants," and those participants (Harvey, Thomson Reuters, vLex, Vecflow) joined voluntarily and chose which skills to be evaluated on, so the parties it ranks are also the parties that fund it and shape what gets measured.

Source →

Operating since: 2023 (3 years) · source
What it costs you: Free to read The reviews are free to read.
How they make money: It earns revenue from enterprise/custom benchmarking engagements, licensing of its private validation datasets, and access to its evaluation platform/infrastructure, including from some of the AI vendors it evaluates.
What they do: It publishes independent leaderboards that score AI models and agentic products on realistic legal, tax, and finance tasks by hands-on testing them against expert-built, private held-out test sets.
What to watch for: In its industry reports the evaluated vendors participate voluntarily and pick which skills they're scored on (and some are also paying Vals customers), so a leaderboard may omit tasks where a vendor would look weak rather than show the full picture.
Composite score: 4.00 / 5.00 → grade A-
Last verified: July 11, 2026 — when we last checked this entry's facts, links, and grade against the live site.
Site confirmed live: July 11, 2026 — our monthly automated check reached Vals AI and got a normal response.

How the grade was reached

Independence · 30% weight 3 / 5

Does the site take money from the very entities it ranks? Pay-for-placement, vendor-funded data, and affiliate commissions all pull this down. The less the ranking can be bought, the higher the score.

Evidence basis · 30% weight 5 / 5

What is the ranking actually built on? Hands-on testing scores highest, then verified first-hand reviews, then opinion or popularity surveys and self-reported figures, then pay-to-rank, which scores lowest.

Method transparency · 20% weight 4 / 5

Is the methodology published, specific, and reproducible? Can a reader see how a given rank was reached, or is it a black box?

Conflict disclosure · 10% weight 4 / 5

Are commercial relationships, sponsorships, and affiliate arrangements disclosed clearly and near the rankings themselves, rather than buried?

Manipulation resistance · 10% weight 4 / 5

How hard is it to game? Controls against fake reviews, solicited reviews, and vendor gaming raise this; an open box anyone can stuff lowers it.

Evidence

Vals AI was founded in 2023 by Rayan Krishnan and Langston Nashold, who left Stanford's AI master's program to build a third-party LLM evaluation system with Stanford researchers and domain experts in law, accounting, and finance; it is an independent San Francisco startup backed by investors including Pear VC and a Sequoia scout. Source: TechTimes / Bloomberg coverage of Vals AI launch →
Vals AI's methodology builds custom, expert-curated benchmarks and uses a three-tier dataset structure: a small public set, a licensed private validation set, and a test set that 'remains private at all times' and is the only set used for published benchmarks, to prevent leakage into model training. It reports accuracy plus latency, cost, and error analysis from hands-on model/agent runs. Source: Vals AI – Our Methodology →
In the Vals Legal AI Report, evaluated vendors (Harvey, Thomson Reuters, vLex, Vecflow; LexisNexis withdrew from most skills) participated voluntarily and each chose which skills to opt into, and the report states: 'Note that Vals AI has a customer relationship with one or more of the participants.' Independent commentary (Artificial Lawyer) notes this transparency 'naturally raise[s] doubts about impartiality.' Source: Vals Legal AI Report (VLAIR) + Artificial Lawyer →

Compare with others

Others reviewing ai models (compare all →)

A+ EvalPlus Leaderboard A+ Stanford HELM A SWE-bench A- Artificial Analysis A- The Verge A- Digital Trends B+ Engadget B TechRadar C+ LMArena C+ Vellum LLM Leaderboard C- There's An AI For That (TAAFT) C- Product Hunt D+ Futurepedia D Toolify AI A+ Hugging Face Open LLM Leaderboard B- Papers with Code

← Back to the Report Card