Local LLM benchmarking

Benchmark local models with repeatable prompt runs

Select models from the default local catalog, define prompts with generation settings, run sequential inference benchmarks, then compare averages and per-run details in interactive charts.

1. Model Selection

Choose one or more local models from the prebuilt catalog.

2. Prompt Setup

Create prompts and tune generation settings per prompt.

Prompt 1

3. Run

Sequential execution with skip-on-failure behavior.

WebGPU: Unavailable

WebGPU is not available in this browser. Use a WebGPU-enabled browser.

Status: Idle

4. Results

Aggregated averages per model/prompt. Hover bars to inspect every run and failed-run count.

Inference TimeTTFTTokens/sec
end-to-end duration per run

No benchmark results yet. Configure models and prompts, then run a benchmark.

Execution log

No events yet.