Methodology

How we grade the graders.

One rubric, applied the same way to everyone, with the math shown. If we ever take money, it gets disclosed here first. This page is the product.

The five dimensions

Each reviewer is scored 0 to 5 on five dimensions. The weighted average maps to a letter grade. No entry is hand-graded; change a score and the letter recomputes.

Independence 30% of the grade

Does the site take money from the very entities it ranks? Pay-for-placement, vendor-funded data, and affiliate commissions all pull this down. The less the ranking can be bought, the higher the score.

Evidence basis 30% of the grade

What is the ranking actually built on? Hands-on testing scores highest, then verified first-hand reviews, then opinion or popularity surveys and self-reported figures, then pay-to-rank, which scores lowest.

Method transparency 20% of the grade

Is the methodology published, specific, and reproducible? Can a reader see how a given rank was reached, or is it a black box?

Conflict disclosure 10% of the grade

Are commercial relationships, sponsorships, and affiliate arrangements disclosed clearly and near the rankings themselves, rather than buried?

Manipulation resistance 10% of the grade

How hard is it to game? Controls against fake reviews, solicited reviews, and vendor gaming raise this; an open box anyone can stuff lowers it.

From score to letter

Grade A

Independent and evidence-based. You can trust the order with eyes open.

Grade B

Mostly sound, with a real but disclosed conflict or a thin evidence basis.

Grade C

Useful signal, compromised method. Read it, do not lean on it.

Grade D

Commercial influence or no real evidence shapes the ranking.

Grade F

The ranking is effectively for sale or fabricated.

The pay-to-rank cap

The weighted rubric has one published exception, and it only ever pushes a grade down. When a site's own published statements — an advertiser disclosure, a ranking-methodology page, an about page, its vendor pricing materials — say that compensation can affect which providers appear in its rankings, or in what order, or what they score, the overall grade is capped at D+ no matter how well the site scores elsewhere. A ranking whose order can be bought is advertising, and no amount of polish on the other dimensions should let it grade like journalism. Every capped entry quotes the site's own wording in its "Follow the money" section, and the full list is at /pay-to-rank.

Three things deliberately do not trigger the cap. Sites that explicitly deny payment affects their rankings are not capped — we record the denial, which is a falsifiable public claim, and grade on the rest of the evidence. Paid slots confined to clearly-labeled ad units do not trigger it either; the cap is for paid influence inside what is presented as a merit ranking. And marketplaces are handled by their role tag instead, since a marketplace does not present its listing order as an editorial verdict in the first place. The cap fires on a site's own words alone, never on our inference — if we got a site's disclosure wrong, tell us and we will correct it in public.

What we grade, and what we don't

We grade the method: how independent the ranking is from the money of those it ranks, and how much real evidence sits underneath it. We do not claim any specific ranking is factually wrong, and we do not re-test the underlying products or firms ourselves yet. A high grade means the method is trustworthy, not that every placement is perfect.

Conflicts and recusal

Our team also operates HearthGrades, an independent senior-care rating site that takes no money from the facilities it grades. It exists because this report card showed senior care to be one of the most conflicted review categories we cover. That also makes us a player in senior care, so we recuse: Plumb will never grade HearthGrades, and the senior-care category page carries this disclosure prominently. A referee should not rate its own player, and you should weigh our senior-care grades knowing we built one. If we ever operate a site in another category, the same rule applies and it will be disclosed here first.

How sure we are

Every card carries a scoring-confidence level, because not every entry is equally nailed down. High means the facts and the grade are checked against primary sources. Medium means mostly sourced, with a detail still to confirm. Low means an initial assessment from public information, treated as provisional. We would rather show you our uncertainty than hide it.

You will notice there are currently no Low-confidence entries. That is on purpose, not an oversight: if we cannot yet stand behind a grade, we hold the entry back and keep verifying rather than publishing it with a shaky score. So if a site is not on our report card, we either have not graded it yet or its method is not public enough to grade it fairly, and you can tell us what is missing. "Low" stays on the scale for entries in progress, so the filter is honest about being empty for now. We would rather under-promise on coverage than show you a grade we do not trust.

Keeping grades alive

A grade that quietly rots is worse than no grade, so every entry carries two dates and they mean different things. "Last verified" is when a human last checked the entry's facts, links, and grade against the live site; grades were last verified June – July 2026. "Site confirmed live" comes from an automated monitor that visits every graded site's link on the first of each month and reports what it found — reachable, moved to a new domain, gone, or hiding behind bot protection (in which case we say so rather than guess). Dead and moved sites are flagged for re-review; closed or retired sites that stay on the board for the record carry a visible "Closed / dormant" chip and rank below every active site in their category. The monitor's raw results are public at /data/link-health.json, per site, machine-readable.

Last check: July 11, 2026 — 290 sites checked; 220 confirmed live, 2 moved, 0 unreachable, 68 blocked automated visitors (inconclusive, verified by hand at grade reviews).

Types of review site

The grade always measures the same thing, but a review site's type tells you what its rating is even made of. A hands-on test and a pay-to-play directory are not doing the same job, and each type has its own way of being trustworthy or being gamed. Here is the lens we use.

Hands-on tester Buys and tests products hands-on, usually in a lab. The review is the product.

Good for Tests products itself, so the verdict rests on evidence rather than vendor claims or popularity. The hardest type to fake.

Watch for Slow and selective — it can't test everything — and is often funded by affiliate links on what it recommends, which can nudge picks toward what pays.

Benchmark Runs standardized, repeatable tests or metrics and publishes a leaderboard.

Good for Standardized, repeatable, and directly comparable across contestants. The most objective type when the test is public.

Watch for Only measures what the test measures, can miss real-world quality, and contestants can optimize to the test. Some let vendors opt in or pay.

Score aggregator Averages other people's published reviews into one score rather than reviewing anything itself.

Good for Combines many critics, smoothing out any single reviewer's bias into a quick read on the consensus.

Watch for Only as good as the reviews it pools, and the weighting (and which critics count) is often hidden.

Editorial reviews Staff writers produce the reviews and "best of" picks, often funded by affiliate links.

Good for Human experts apply judgment and explain their reasoning, weighing context a raw score cannot.

Watch for Usually paid through affiliate links, so 'best of' lists can tilt toward products that pay a commission, and a better non-partner option may be left off entirely.

Crowd reviews Publishes reviews and star ratings submitted by users.

Good for Real user experiences at scale, which can surface problems experts miss and is hard for any one party to fully control.

Watch for Easy to game with fake or solicited reviews, skews to extremes (love it or hate it), and the headline average can hide which reviews were filtered out.

Ratings & rankings Produces its own ratings, rankings, or awards from data or surveys.

Good for Applies one consistent set of criteria across a whole field, which is handy for a structured comparison.

Watch for Often built on reputation surveys or self-reported data rather than outcomes, so 'best' can mean 'best known' — and award or seal-licensing fees can blur the line.

Directory / lead-gen Lists or matches providers, often with paid placement or per-lead fees.

Good for Comprehensive coverage of providers in one place, convenient for finding your options.

Watch for The order is usually pay-to-play: who sits on top reflects who paid for placement or leads, not who is best. Treat the ranking as advertising.

Marketplace A place you browse, book, or transact; the reviews are attached to that.

Good for Reviews are often tied to a verified booking or purchase, which makes them harder to fake.

Watch for The site earns money on the transaction, so placement and ranking are tuned for conversion and revenue, not neutral merit. Reviews are a feature, not the mission.

Sources

Every claim on a reviewer's page links to a source: a primary document (a company's own methodology page, an SEC or IRS filing, a regulator's rule) wherever one exists, and reputable reporting otherwise. When a figure is self-reported and unaudited, we say so on the page.

Found an error or a stale fact? That is the most useful thing you can send us — email corrections@plumbreviews.com. We correct in public and note the date on the corrections & changes page, because a review site that will not take a correction is exactly the kind we are grading.