Methodology

This is a working draft. Final copy ships in plan 5.

Tools

Every number on this site is produced by hwbench, an open-source benchmark tool. The tool captures CPU, memory, storage, LLM inference, power, and thermal metrics into a single JSON file per run.

Metrics we headline

Generation tok/s. Sustained token-generation rate.
tok/s/W. Our primary ranking metric — efficiency matters more than peak throughput for 24/7 home inference.
TTFT. Time-to-first-token; latency matters for chat UX.
Peak CPU temp + min clock. Exposes sustained-load throttling.

Reproducibility

git clone kranky-ai/hwbench && ./install.sh gets you the same tool we use. Every benchmark commit on this site includes the hwbench git SHA, so any result can be re-run.

What this site does not do

We do not run vendor-supplied benchmarks.
We do not accept sponsored reviews (Phase 1 — sample-with-disclosure may begin later).
We do not anonymize bad results. If a machine is bad, we say so.