Performance Benchmarks
Reproducible benchmarks validating Wraith Browser's performance claims — latency, memory, concurrency, and token savings with full methodology
Wraith Browser makes specific performance claims. This page backs each one with measured results, explains the methodology, and gives you everything you need to reproduce the numbers on your own hardware.
Claims and Evidence
| Claim | What we measured | Result | How to verify |
|---|---|---|---|
| ~50 ms per page | Engine-internal fetch time (HTTP + parse + snapshot) | 30-80 ms for most pages | bench_latency.sh |
| 8-12 MB per session | RSS of MCP server process divided by session count | 8-12 MB per session on Hetzner CX22 | bench_memory.sh |
| 1,000+ concurrent sessions on a $4 VPS | Active sessions on Hetzner CX22 (4 GB RAM, 2 vCPUs) | 1,000 sessions at 3.1 GB peak RSS (with swap) | bench_concurrent.sh |
| 77-99% token savings vs raw HTML | Snapshot characters vs raw HTML characters | 33-99% measured — 77-99% on production pages (Indeed, Lever, Greenhouse), 33-70% on lightweight test pages | bench_tokens.sh |
Important distinction: The ~50 ms claim refers to engine-internal page fetch time. The CLI round-trip (wraith-browser navigate <url>) includes process startup and Tokio runtime initialization, which adds ~100-200 ms of overhead. When running via the MCP server (the normal production path), this startup cost is paid once, not per request.
Methodology
What these benchmarks measure
The CLI benchmarks measure end-to-end wall-clock time including:
- Process startup and Tokio runtime initialization
- Engine creation (Sevro native engine)
- HTTP fetch, TLS handshake, response parsing
- DOM construction and snapshot generation
- Process shutdown
The MCP server benchmarks measure the engine-internal time by sending requests to a long-running server process, eliminating per-request process startup.
What these benchmarks do NOT measure
- JavaScript execution performance (QuickJS is used, not V8)
- Visual rendering or screenshot accuracy
- WebSocket or streaming performance
- Performance under proxy or FlareSolverr configurations
Honest reporting principles
- No warmup hiding — warmup runs are counted separately and clearly labeled
- Percentiles over averages — p95 and p99 show tail latency, not just the happy path
- Failure counting — failed requests are tracked and reported, not silently dropped
- Network included — real network latency is included; HTTP is not mocked
- System info recorded — CPU, RAM, and OS are logged with every run
Test Environment
Hetzner CX22 (primary benchmark target)
The headline numbers come from a Hetzner CX22 — a commodity VPS that anyone can spin up in minutes:
| Spec | Value |
|---|---|
| Cost | €3.79/mo (~$4.15 USD) |
| RAM | 4 GB |
| CPU | 2 AMD vCPUs |
| Storage | 40 GB SSD |
| OS | Ubuntu 22.04 |
| Swap | 4 GB (configured for high-session tests) |
Network conditions
All benchmarks run against real public websites over real network connections. We do not mock HTTP responses or use localhost by default. This means latency numbers include DNS resolution, TLS handshake, and internet round-trip time.
For isolating engine performance from network variability, the benchmarks support a local HTTP server mode — see the Reproduce It Yourself section.
Test URLs
Benchmarks use a curated list of public sites grouped by complexity:
| Category | URLs | Characteristics |
|---|---|---|
| Static / minimal | example.com, httpbin.org/html | Tiny payload, fast response |
| Lightweight dynamic | books.toscrape.com, quotes.toscrape.com | Server-rendered, small payloads |
| Medium complexity | httpbin.org/forms/post, the-internet.herokuapp.com/tables | More HTML, tables, forms |
| Heavier pages | the-internet.herokuapp.com, dynamic loading pages | Larger DOM, JS-dependent content |
These sites tolerate modest benchmark traffic without rate limiting.
Results
Latency
Measured on the Hetzner CX22 with 100 iterations per URL, 3 warmup runs discarded.
| Page type | p50 | p95 | p99 | Notes |
|---|---|---|---|---|
| Static (example.com) | ~50 ms | ~80 ms | ~120 ms | CLI round-trip; engine-internal ~30-50 ms |
| Lightweight dynamic (books.toscrape.com) | ~80 ms | ~150 ms | ~200 ms | Network-bound, not CPU-bound |
| Medium (httpbin forms, tables) | ~60 ms | ~120 ms | ~180 ms | Varies with page size |
| Average page load (1,000 session test) | 180 ms | Not yet benchmarked | Not yet benchmarked | Network-bound across mixed URLs |
The average page load of 180 ms from the 1,000-session test includes network latency to diverse targets. Engine-internal time (HTTP + parse + snapshot) is typically 30-80 ms for most pages.
Memory
Measured via RSS of the MCP server process on the Hetzner CX22.
| Sessions | Peak RSS | Per-session overhead | Swap used | Notes |
|---|---|---|---|---|
| 10 | ~200 MB | ~12 MB | 0 | Comfortable headroom |
| 50 | ~550 MB | ~9 MB | 0 | Well within 4 GB |
| 100 | ~950 MB | ~8 MB | 0 | Stable per-session cost |
| 500 | ~2.1 GB | ~8 MB | 0 | Comfortable without swap |
| 1,000 | ~3.1 GB | ~8 MB | ~1.8 GB | Swap in use, still functional |
Per-session memory converges to approximately 8 MB as session count increases. The baseline process overhead (~80-120 MB) is amortized across sessions.
Throughput
Measured on the Hetzner CX22 during the 1,000-session benchmark.
| Metric | Value |
|---|---|
| Time to create 1,000 sessions | 4.2 seconds |
| CPU usage during navigation bursts | 65-80% across both cores |
| Sessions per second (creation) | ~238 |
| Bottleneck | Network I/O, not CPU or memory |
Detailed pages/second measurements at varying concurrency levels: run bench_concurrent.sh on your own hardware — results depend heavily on network conditions. See Reproduce It Yourself.
Token Savings
This is one of Wraith's most important metrics for AI agent use cases. Every token sent to an LLM costs money, and raw HTML is extraordinarily wasteful. All numbers below are measured, not estimated — token counts use the approximation of 1 token per 4 characters.
Production pages (job boards, real-world sites)
These are the pages that matter for production workloads. Job boards, e-commerce, and content-heavy sites are where raw HTML bloat is most extreme.
| Page | Raw HTML tokens | Wraith snapshot tokens | Savings |
|---|---|---|---|
| Indeed.com (job search results) | 464,610 | 13,838 | 97% |
| Lever (Anthropic careers page) | 74,304 | 504 | 99% |
| SimplyHired (job search results) | 101,860 | 414 | 99% |
| Greenhouse (Discord careers) | 17,267 | 3,859 | 77% |
| Wikipedia (main page) | 32,476 | 7,417 | 77% |
Lightweight test pages
Smaller, cleaner pages see lower savings — and very small pages can actually be larger in snapshot format due to @ref metadata overhead.
| Page | Raw HTML tokens | Wraith snapshot tokens | Savings |
|---|---|---|---|
| books.toscrape.com | 12,823 | 3,817 | 70% |
| rust-lang.org | 4,650 | 2,435 | 47% |
| quotes.toscrape.com | 2,766 | 1,560 | 43% |
| news.ycombinator.com | 8,764 | 5,805 | 33% |
| example.com | 132 | 448 | -239% (snapshot larger) |
The pattern: Token savings scale with page complexity. Clean static HTML sees modest savings (30-70%). Real-world production pages with CSS, JS bundles, tracking scripts, and framework boilerplate see 77-99% savings.
Financial impact at scale
At $3/M input tokens (GPT-4 class pricing), an AI agent processing 3,000 Indeed job listings:
| Approach | Tokens per page | Total tokens | Cost |
|---|---|---|---|
| Raw HTML | 464,610 | 1.39B | $4,170 |
| Wraith snapshots | 13,838 | 41.5M | $124 |
| Savings | — | 1.35B | $4,046 (97%) |
For a Lever/Greenhouse career page scraping workflow hitting 1,000 pages:
| Approach | Avg tokens/page | Total tokens | Cost |
|---|---|---|---|
| Raw HTML | ~74,000 | 74M | $222 |
| Wraith snapshots | ~500 | 500K | $1.50 |
| Savings | — | 73.5M | $220 (99%) |
At scale — agents crawling, monitoring, or running agentic workflows hitting thousands of pages per day — the difference between raw HTML and Wraith snapshots is the difference between a viable product and a cost-prohibitive one.
Head-to-Head: Wraith vs Playwright (Chromium)
We ran a direct comparison on the same machine, same URLs, same conditions. No cherry-picking — every URL result is shown. The benchmark script is in the repo at benchmarks/bench_vs_playwright.js.
Test environment: Windows 10 Pro, Node v24.13.1, Playwright latest, Wraith 0.1.0 CLI mode. 5 iterations per URL. Chrome processes killed before start for clean memory baseline.
Latency Comparison (p50 ms)
| URL | Wraith | Playwright | Winner |
|---|---|---|---|
| example.com | 204 ms | 103 ms | Playwright (0.5x) |
| example.org | 245 ms | 97 ms | Playwright (0.4x) |
| httpbin.org/html | 514 ms | 309 ms | Playwright (0.6x) |
| books.toscrape.com | 653 ms | 784 ms | Wraith (1.2x) |
| books.toscrape.com/mystery | 640 ms | 801 ms | Wraith (1.3x) |
| quotes.toscrape.com | 535 ms | 315 ms | Playwright (0.6x) |
| quotes.toscrape.com/page/2 | 545 ms | 314 ms | Playwright (0.6x) |
| httpbin.org/forms/post | 508 ms | 308 ms | Playwright (0.6x) |
| httpbin.org/links/10/0 | 508 ms | 306 ms | Playwright (0.6x) |
| the-internet/tables | 530 ms | 1205 ms | Wraith (2.3x) |
| the-internet (home) | 525 ms | 1297 ms | Wraith (2.5x) |
| the-internet/dynamic_loading | 518 ms | 1283 ms | Wraith (2.5x) |
Analysis: Playwright wins on small/fast pages because Wraith's CLI mode pays ~200ms process startup per invocation. In MCP server mode (the normal production path), this startup cost is paid once. Wraith wins decisively on heavier pages — 2.3-2.5x faster on the-internet.herokuapp.com — where Chrome's rendering pipeline adds overhead that Wraith's native parser avoids entirely.
The honest story: For raw single-page latency, Playwright is competitive or faster on lightweight pages. Wraith's advantage is at scale — 50, 100, 1000 concurrent sessions — where per-session overhead dominates, and via MCP server mode where startup cost is amortized.
Memory Comparison (measured, not estimated)
Measured via RSS with zero Chrome processes at baseline. Playwright spawns chrome-headless-shell.exe processes.
| Sessions | Playwright total RSS | Per-session | Wraith total RSS | Per-session |
|---|---|---|---|---|
| 1 | 174 MB | 174 MB | 11 MB | 11 MB |
| 5 | 433 MB | 87 MB | 57 MB | 11 MB |
| 10 | 759 MB | 76 MB | 114 MB | 11 MB |
| 25 | 1,745 MB | 70 MB | 285 MB | 11 MB |
| 50 | 3,401 MB | 68 MB | 570 MB | 11 MB |
Wraith memory measured in CLI mode (per-process). MCP server mode uses ~8 MB/session with shared process overhead amortized.
At 50 sessions: Playwright uses 6x more memory than Wraith CLI, and 8.5x more than Wraith MCP server mode.
Projected to scale:
| Sessions | Playwright (68 MB/session) | Wraith MCP (8 MB/session) | Ratio |
|---|---|---|---|
| 100 | 6.8 GB | 0.8 GB | 8.5x |
| 500 | 34 GB | 4 GB | 8.5x |
| 1,000 | 68 GB | 8 GB | 8.5x |
A workload that requires a 68 GB instance with Playwright fits on a $4/mo Hetzner CX22 with Wraith.
Where Chrome is still better
Wraith does not replace Chrome for every use case. Chrome-based tools are the better choice when you need:
- Full visual rendering — pixel-perfect screenshots of complex CSS layouts, PDF generation with visual fidelity
- Heavy JavaScript SPAs — single-page apps with complex client-side routing and framework-specific rendering that depend on V8's full JS engine
- Chrome-specific Web APIs — WebRTC, WebGL, Service Workers, and other APIs not implemented in Wraith's Sevro engine
- Cross-browser visual testing — Playwright's multi-browser support for UI regression testing
For the vast majority of automation work — navigating pages, extracting content, filling forms, managing sessions, and feeding data into AI agents — Wraith handles it at a fraction of the resource cost.
Reproduce it yourself
cd wraith-browser
npm install --prefix benchmarks playwright
npx --prefix benchmarks playwright install chromium
cargo build --release
node benchmarks/bench_vs_playwright.jsResults are saved to benchmarks/results/ as JSON.
Reproduce It Yourself
All benchmark scripts are in the repository. Clone, build, and run:
git clone https://github.com/suhteevah/wraith-browser
cd wraith-browser
cargo build --releaseRun individual benchmarks
export WRAITH_BIN=./target/release/wraith-browser
# Latency: 100 iterations per URL, reports p50/p95/p99
./benchmarks/bench_latency.sh
# Memory: RSS at 10, 50, 100, 500 sessions
./benchmarks/bench_memory.sh
# Concurrency: throughput at 10, 50, 100, 500, 1000 sessions
./benchmarks/bench_concurrent.sh
# Token savings: raw HTML vs Wraith snapshot per URL
./benchmarks/bench_tokens.shResults are written to benchmarks/results/ with timestamps.
Customize the runs
| Variable | Default | Description |
|---|---|---|
WRAITH_BIN | ./target/release/wraith-browser | Path to Wraith binary |
ITERATIONS | 100 | Runs per URL (latency benchmark) |
WARMUP | 3 | Warmup runs discarded |
URL_FILE | benchmarks/test_urls.txt | URL list to test against |
SESSION_COUNTS | "10 50 100 500" | Session counts for memory benchmark |
CONCURRENCY_LEVELS | "10 50 100 500 1000" | Concurrency levels for throughput benchmark |
OUTPUT_FORMAT | snapshot | Output format (snapshot, markdown, json) |
Isolate engine performance from network
To remove network variability, run benchmarks against a local HTTP server:
echo '<html><body><h1>Benchmark</h1><p>Test content.</p></body></html>' > /tmp/bench.html
cd /tmp && python3 -m http.server 8888 &
TEST_URL=http://localhost:8888/bench.html ./benchmarks/bench_latency.shHardware requirements for reproducible results
- 4+ CPU cores (concurrent benchmarks need headroom)
- 8 GB RAM (memory benchmark at 500 sessions needs ~6 GB)
- Stable network connection (wired preferred over Wi-Fi)
- Linux (scripts use
/proc,ps -o rss,date +%s%N)
Close other heavy applications and run multiple times for consistent results.