Local LLM benchmarking

Benchmark local models with repeatable prompt runs

Select models from the default local catalog, define prompts with generation settings, run sequential inference benchmarks, then compare averages and per-run details in interactive charts.

4. Results

Aggregated averages per model/prompt. Hover bars to inspect every run and failed-run count.

Inference Time

TTFT

Tokens/sec

end-to-end duration per run

No benchmark results yet. Configure models and prompts, then run a benchmark.

Execution log

No events yet.

Benchmark local models with repeatable prompt runs

1. Model Selection

2. Prompt Setup

Prompt 1

3. Run

4. Results

Execution log