A rare independent benchmark of AI on real legal and tax work, with the catch that vendors opt in and some pay.
What it's really for An independent AI benchmark; testing models on real expert tasks is the whole job.
What our grade covers The grade on this page is about its hands-on AI leaderboards on legal, tax, and finance tasks, not everything the site does.
High Scoring Confidence Checked against primary sources. We are confident in the facts and the grade here.
Vals AI is paid by AI vendors it benchmarks: in its Vals Legal AI Report it discloses "Vals AI has a customer relationship with one or more of the participants," and those participants (Harvey, Thomson Reuters, vLex, Vecflow) joined voluntarily and chose which skills to be evaluated on, so the parties it ranks are also the parties that fund it and shape what gets measured.
Source →- Operating since
- 2023 (3 years) · source
- What it costs you
- Free to read The reviews are free to read.
- How they make money
- It earns revenue from enterprise/custom benchmarking engagements, licensing of its private validation datasets, and access to its evaluation platform/infrastructure, including from some of the AI vendors it evaluates.
- What they do
- It publishes independent leaderboards that score AI models and agentic products on realistic legal, tax, and finance tasks by hands-on testing them against expert-built, private held-out test sets.
- What to watch for
- In its industry reports the evaluated vendors participate voluntarily and pick which skills they're scored on (and some are also paying Vals customers), so a leaderboard may omit tasks where a vendor would look weak rather than show the full picture.
- Composite score
- 4.00 / 5.00 → grade A-
How the grade was reached
Does the site take money from the very entities it ranks? Pay-for-placement, vendor-funded data, and affiliate commissions all pull this down. The less the ranking can be bought, the higher the score.
What is the ranking actually built on? Hands-on testing scores highest, then verified first-hand reviews, then opinion or popularity surveys and self-reported figures, then pay-to-rank, which scores lowest.
Is the methodology published, specific, and reproducible? Can a reader see how a given rank was reached, or is it a black box?
Are commercial relationships, sponsorships, and affiliate arrangements disclosed clearly and near the rankings themselves, rather than buried?
How hard is it to game? Controls against fake reviews, solicited reviews, and vendor gaming raise this; an open box anyone can stuff lowers it.
Evidence
- Vals AI was founded in 2023 by Rayan Krishnan and Langston Nashold, who left Stanford's AI master's program to build a third-party LLM evaluation system with Stanford researchers and domain experts in law, accounting, and finance; it is an independent San Francisco startup backed by investors including Pear VC and a Sequoia scout. Source: TechTimes / Bloomberg coverage of Vals AI launch →
- Vals AI's methodology builds custom, expert-curated benchmarks and uses a three-tier dataset structure: a small public set, a licensed private validation set, and a test set that 'remains private at all times' and is the only set used for published benchmarks, to prevent leakage into model training. It reports accuracy plus latency, cost, and error analysis from hands-on model/agent runs. Source: Vals AI – Our Methodology →
- In the Vals Legal AI Report, evaluated vendors (Harvey, Thomson Reuters, vLex, Vecflow; LexisNexis withdrew from most skills) participated voluntarily and each chose which skills to opt into, and the report states: 'Note that Vals AI has a customer relationship with one or more of the participants.' Independent commentary (Artificial Lawyer) notes this transparency 'naturally raise[s] doubts about impartiality.' Source: Vals Legal AI Report (VLAIR) + Artificial Lawyer →