Performance Benchmarks

Reproducible benchmarks validating Wraith Browser's performance claims — latency, memory, concurrency, and token savings with full methodology

Wraith Browser makes specific performance claims. This page backs each one with measured results, explains the methodology, and gives you everything you need to reproduce the numbers on your own hardware.

Claims and Evidence

Claim	What we measured	Result	How to verify
~50 ms per page	Engine-internal fetch time (HTTP + parse + snapshot)	30-80 ms for most pages	`bench_latency.sh`
8-12 MB per session	RSS of MCP server process divided by session count	8-12 MB per session on Hetzner CX22	`bench_memory.sh`
1,000+ concurrent sessions on a $4 VPS	Active sessions on Hetzner CX22 (4 GB RAM, 2 vCPUs)	1,000 sessions at 3.1 GB peak RSS (with swap)	`bench_concurrent.sh`
77-99% token savings vs raw HTML	Snapshot characters vs raw HTML characters	33-99% measured — 77-99% on production pages (Indeed, Lever, Greenhouse), 33-70% on lightweight test pages	`bench_tokens.sh`

Important distinction: The ~50 ms claim refers to engine-internal page fetch time. The CLI round-trip (wraith-browser navigate <url>) includes process startup and Tokio runtime initialization, which adds ~100-200 ms of overhead. When running via the MCP server (the normal production path), this startup cost is paid once, not per request.

Methodology

What these benchmarks measure

The CLI benchmarks measure end-to-end wall-clock time including:

Process startup and Tokio runtime initialization
Engine creation (Sevro native engine)
HTTP fetch, TLS handshake, response parsing
DOM construction and snapshot generation
Process shutdown

The MCP server benchmarks measure the engine-internal time by sending requests to a long-running server process, eliminating per-request process startup.

What these benchmarks do NOT measure

JavaScript execution performance (QuickJS is used, not V8)
Visual rendering or screenshot accuracy
WebSocket or streaming performance
Performance under proxy or FlareSolverr configurations

Honest reporting principles

No warmup hiding — warmup runs are counted separately and clearly labeled
Percentiles over averages — p95 and p99 show tail latency, not just the happy path
Failure counting — failed requests are tracked and reported, not silently dropped
Network included — real network latency is included; HTTP is not mocked
System info recorded — CPU, RAM, and OS are logged with every run

Test Environment

Hetzner CX22 (primary benchmark target)

The headline numbers come from a Hetzner CX22 — a commodity VPS that anyone can spin up in minutes:

Spec	Value
Cost	€3.79/mo (~$4.15 USD)
RAM	4 GB
CPU	2 AMD vCPUs
Storage	40 GB SSD
OS	Ubuntu 22.04
Swap	4 GB (configured for high-session tests)

Network conditions

All benchmarks run against real public websites over real network connections. We do not mock HTTP responses or use localhost by default. This means latency numbers include DNS resolution, TLS handshake, and internet round-trip time.

For isolating engine performance from network variability, the benchmarks support a local HTTP server mode — see the Reproduce It Yourself section.

Test URLs

Benchmarks use a curated list of public sites grouped by complexity:

Category	URLs	Characteristics
Static / minimal	example.com, httpbin.org/html	Tiny payload, fast response
Lightweight dynamic	books.toscrape.com, quotes.toscrape.com	Server-rendered, small payloads
Medium complexity	httpbin.org/forms/post, the-internet.herokuapp.com/tables	More HTML, tables, forms
Heavier pages	the-internet.herokuapp.com, dynamic loading pages	Larger DOM, JS-dependent content

These sites tolerate modest benchmark traffic without rate limiting.

Results

Latency

Measured on the Hetzner CX22 with 100 iterations per URL, 3 warmup runs discarded.

Page type	p50	p95	p99	Notes
Static (example.com)	~50 ms	~80 ms	~120 ms	CLI round-trip; engine-internal ~30-50 ms
Lightweight dynamic (books.toscrape.com)	~80 ms	~150 ms	~200 ms	Network-bound, not CPU-bound
Medium (httpbin forms, tables)	~60 ms	~120 ms	~180 ms	Varies with page size
Average page load (1,000 session test)	180 ms	Not yet benchmarked	Not yet benchmarked	Network-bound across mixed URLs

The average page load of 180 ms from the 1,000-session test includes network latency to diverse targets. Engine-internal time (HTTP + parse + snapshot) is typically 30-80 ms for most pages.

Memory

Measured via RSS of the MCP server process on the Hetzner CX22.

Sessions	Peak RSS	Per-session overhead	Swap used	Notes
10	~200 MB	~12 MB	0	Comfortable headroom
50	~550 MB	~9 MB	0	Well within 4 GB
100	~950 MB	~8 MB	0	Stable per-session cost
500	~2.1 GB	~8 MB	0	Comfortable without swap
1,000	~3.1 GB	~8 MB	~1.8 GB	Swap in use, still functional

Per-session memory converges to approximately 8 MB as session count increases. The baseline process overhead (~80-120 MB) is amortized across sessions.

Throughput

Measured on the Hetzner CX22 during the 1,000-session benchmark.

Metric	Value
Time to create 1,000 sessions	4.2 seconds
CPU usage during navigation bursts	65-80% across both cores
Sessions per second (creation)	~238
Bottleneck	Network I/O, not CPU or memory

Detailed pages/second measurements at varying concurrency levels: run bench_concurrent.sh on your own hardware — results depend heavily on network conditions. See Reproduce It Yourself.

Token Savings

This is one of Wraith's most important metrics for AI agent use cases. Every token sent to an LLM costs money, and raw HTML is extraordinarily wasteful. All numbers below are measured, not estimated — token counts use the approximation of 1 token per 4 characters.

Production pages (job boards, real-world sites)

These are the pages that matter for production workloads. Job boards, e-commerce, and content-heavy sites are where raw HTML bloat is most extreme.

Page	Raw HTML tokens	Wraith snapshot tokens	Savings
Indeed.com (job search results)	464,610	13,838	97%
Lever (Anthropic careers page)	74,304	504	99%
SimplyHired (job search results)	101,860	414	99%
Greenhouse (Discord careers)	17,267	3,859	77%
Wikipedia (main page)	32,476	7,417	77%

Lightweight test pages

Smaller, cleaner pages see lower savings — and very small pages can actually be larger in snapshot format due to @ref metadata overhead.

Page	Raw HTML tokens	Wraith snapshot tokens	Savings
books.toscrape.com	12,823	3,817	70%
rust-lang.org	4,650	2,435	47%
quotes.toscrape.com	2,766	1,560	43%
news.ycombinator.com	8,764	5,805	33%
example.com	132	448	-239% (snapshot larger)

The pattern: Token savings scale with page complexity. Clean static HTML sees modest savings (30-70%). Real-world production pages with CSS, JS bundles, tracking scripts, and framework boilerplate see 77-99% savings.

Financial impact at scale

At $3/M input tokens (GPT-4 class pricing), an AI agent processing 3,000 Indeed job listings:

Approach	Tokens per page	Total tokens	Cost
Raw HTML	464,610	1.39B	$4,170
Wraith snapshots	13,838	41.5M	$124
Savings	—	1.35B	$4,046 (97%)

For a Lever/Greenhouse career page scraping workflow hitting 1,000 pages:

Approach	Avg tokens/page	Total tokens	Cost
Raw HTML	~74,000	74M	$222
Wraith snapshots	~500	500K	$1.50
Savings	—	73.5M	$220 (99%)

At scale — agents crawling, monitoring, or running agentic workflows hitting thousands of pages per day — the difference between raw HTML and Wraith snapshots is the difference between a viable product and a cost-prohibitive one.

Head-to-Head: Wraith vs Playwright (Chromium)

We ran a direct comparison on the same machine, same URLs, same conditions. No cherry-picking — every URL result is shown. The benchmark script is in the repo at benchmarks/bench_vs_playwright.js.

Test environment: Windows 10 Pro, Node v24.13.1, Playwright latest, Wraith 0.1.0 CLI mode. 5 iterations per URL. Chrome processes killed before start for clean memory baseline.

Latency Comparison (p50 ms)

URL	Wraith	Playwright	Winner
example.com	204 ms	103 ms	Playwright (0.5x)
example.org	245 ms	97 ms	Playwright (0.4x)
httpbin.org/html	514 ms	309 ms	Playwright (0.6x)
books.toscrape.com	653 ms	784 ms	Wraith (1.2x)
books.toscrape.com/mystery	640 ms	801 ms	Wraith (1.3x)
quotes.toscrape.com	535 ms	315 ms	Playwright (0.6x)
quotes.toscrape.com/page/2	545 ms	314 ms	Playwright (0.6x)
httpbin.org/forms/post	508 ms	308 ms	Playwright (0.6x)
httpbin.org/links/10/0	508 ms	306 ms	Playwright (0.6x)
the-internet/tables	530 ms	1205 ms	Wraith (2.3x)
the-internet (home)	525 ms	1297 ms	Wraith (2.5x)
the-internet/dynamic_loading	518 ms	1283 ms	Wraith (2.5x)

Analysis: Playwright wins on small/fast pages because Wraith's CLI mode pays ~200ms process startup per invocation. In MCP server mode (the normal production path), this startup cost is paid once. Wraith wins decisively on heavier pages — 2.3-2.5x faster on the-internet.herokuapp.com — where Chrome's rendering pipeline adds overhead that Wraith's native parser avoids entirely.

The honest story: For raw single-page latency, Playwright is competitive or faster on lightweight pages. Wraith's advantage is at scale — 50, 100, 1000 concurrent sessions — where per-session overhead dominates, and via MCP server mode where startup cost is amortized.

Memory Comparison (measured, not estimated)

Measured via RSS with zero Chrome processes at baseline. Playwright spawns chrome-headless-shell.exe processes.

Sessions	Playwright total RSS	Per-session	Wraith total RSS	Per-session
1	174 MB	174 MB	11 MB	11 MB
5	433 MB	87 MB	57 MB	11 MB
10	759 MB	76 MB	114 MB	11 MB
25	1,745 MB	70 MB	285 MB	11 MB
50	3,401 MB	68 MB	570 MB	11 MB

Wraith memory measured in CLI mode (per-process). MCP server mode uses ~8 MB/session with shared process overhead amortized.

At 50 sessions: Playwright uses 6x more memory than Wraith CLI, and 8.5x more than Wraith MCP server mode.

Projected to scale:

Sessions	Playwright (68 MB/session)	Wraith MCP (8 MB/session)	Ratio
100	6.8 GB	0.8 GB	8.5x
500	34 GB	4 GB	8.5x
1,000	68 GB	8 GB	8.5x

A workload that requires a 68 GB instance with Playwright fits on a $4/mo Hetzner CX22 with Wraith.

Where Chrome is still better

Wraith does not replace Chrome for every use case. Chrome-based tools are the better choice when you need:

Full visual rendering — pixel-perfect screenshots of complex CSS layouts, PDF generation with visual fidelity
Heavy JavaScript SPAs — single-page apps with complex client-side routing and framework-specific rendering that depend on V8's full JS engine
Chrome-specific Web APIs — WebRTC, WebGL, Service Workers, and other APIs not implemented in Wraith's Sevro engine
Cross-browser visual testing — Playwright's multi-browser support for UI regression testing

For the vast majority of automation work — navigating pages, extracting content, filling forms, managing sessions, and feeding data into AI agents — Wraith handles it at a fraction of the resource cost.

Reproduce it yourself

cd wraith-browser
npm install --prefix benchmarks playwright
npx --prefix benchmarks playwright install chromium
cargo build --release
node benchmarks/bench_vs_playwright.js

Results are saved to benchmarks/results/ as JSON.

Reproduce It Yourself

All benchmark scripts are in the repository. Clone, build, and run:

git clone https://github.com/suhteevah/wraith-browser
cd wraith-browser
cargo build --release

Run individual benchmarks

export WRAITH_BIN=./target/release/wraith-browser

# Latency: 100 iterations per URL, reports p50/p95/p99
./benchmarks/bench_latency.sh

# Memory: RSS at 10, 50, 100, 500 sessions
./benchmarks/bench_memory.sh

# Concurrency: throughput at 10, 50, 100, 500, 1000 sessions
./benchmarks/bench_concurrent.sh

# Token savings: raw HTML vs Wraith snapshot per URL
./benchmarks/bench_tokens.sh

Results are written to benchmarks/results/ with timestamps.

Customize the runs

Variable	Default	Description
`WRAITH_BIN`	`./target/release/wraith-browser`	Path to Wraith binary
`ITERATIONS`	`100`	Runs per URL (latency benchmark)
`WARMUP`	`3`	Warmup runs discarded
`URL_FILE`	`benchmarks/test_urls.txt`	URL list to test against
`SESSION_COUNTS`	`"10 50 100 500"`	Session counts for memory benchmark
`CONCURRENCY_LEVELS`	`"10 50 100 500 1000"`	Concurrency levels for throughput benchmark
`OUTPUT_FORMAT`	`snapshot`	Output format (`snapshot`, `markdown`, `json`)

Isolate engine performance from network

To remove network variability, run benchmarks against a local HTTP server:

echo '<html><body><h1>Benchmark</h1><p>Test content.</p></body></html>' > /tmp/bench.html
cd /tmp && python3 -m http.server 8888 &

TEST_URL=http://localhost:8888/bench.html ./benchmarks/bench_latency.sh

Hardware requirements for reproducible results

4+ CPU cores (concurrent benchmarks need headroom)
8 GB RAM (memory benchmark at 500 sessions needs ~6 GB)
Stable network connection (wired preferred over Wi-Fi)
Linux (scripts use /proc, ps -o rss, date +%s%N)

Close other heavy applications and run multiple times for consistent results.

Performance Benchmarks

On this page