Wraith Browser
Architecture

Performance Benchmarks

Reproducible benchmarks validating Wraith Browser's performance claims — latency, memory, concurrency, and token savings with full methodology

Wraith Browser makes specific performance claims. This page backs each one with measured results, explains the methodology, and gives you everything you need to reproduce the numbers on your own hardware.


Claims and Evidence

ClaimWhat we measuredResultHow to verify
~50 ms per pageEngine-internal fetch time (HTTP + parse + snapshot)30-80 ms for most pagesbench_latency.sh
8-12 MB per sessionRSS of MCP server process divided by session count8-12 MB per session on Hetzner CX22bench_memory.sh
1,000+ concurrent sessions on a $4 VPSActive sessions on Hetzner CX22 (4 GB RAM, 2 vCPUs)1,000 sessions at 3.1 GB peak RSS (with swap)bench_concurrent.sh
77-99% token savings vs raw HTMLSnapshot characters vs raw HTML characters33-99% measured — 77-99% on production pages (Indeed, Lever, Greenhouse), 33-70% on lightweight test pagesbench_tokens.sh

Important distinction: The ~50 ms claim refers to engine-internal page fetch time. The CLI round-trip (wraith-browser navigate <url>) includes process startup and Tokio runtime initialization, which adds ~100-200 ms of overhead. When running via the MCP server (the normal production path), this startup cost is paid once, not per request.


Methodology

What these benchmarks measure

The CLI benchmarks measure end-to-end wall-clock time including:

  • Process startup and Tokio runtime initialization
  • Engine creation (Sevro native engine)
  • HTTP fetch, TLS handshake, response parsing
  • DOM construction and snapshot generation
  • Process shutdown

The MCP server benchmarks measure the engine-internal time by sending requests to a long-running server process, eliminating per-request process startup.

What these benchmarks do NOT measure

  • JavaScript execution performance (QuickJS is used, not V8)
  • Visual rendering or screenshot accuracy
  • WebSocket or streaming performance
  • Performance under proxy or FlareSolverr configurations

Honest reporting principles

  1. No warmup hiding — warmup runs are counted separately and clearly labeled
  2. Percentiles over averages — p95 and p99 show tail latency, not just the happy path
  3. Failure counting — failed requests are tracked and reported, not silently dropped
  4. Network included — real network latency is included; HTTP is not mocked
  5. System info recorded — CPU, RAM, and OS are logged with every run

Test Environment

Hetzner CX22 (primary benchmark target)

The headline numbers come from a Hetzner CX22 — a commodity VPS that anyone can spin up in minutes:

SpecValue
Cost€3.79/mo (~$4.15 USD)
RAM4 GB
CPU2 AMD vCPUs
Storage40 GB SSD
OSUbuntu 22.04
Swap4 GB (configured for high-session tests)

Network conditions

All benchmarks run against real public websites over real network connections. We do not mock HTTP responses or use localhost by default. This means latency numbers include DNS resolution, TLS handshake, and internet round-trip time.

For isolating engine performance from network variability, the benchmarks support a local HTTP server mode — see the Reproduce It Yourself section.

Test URLs

Benchmarks use a curated list of public sites grouped by complexity:

CategoryURLsCharacteristics
Static / minimalexample.com, httpbin.org/htmlTiny payload, fast response
Lightweight dynamicbooks.toscrape.com, quotes.toscrape.comServer-rendered, small payloads
Medium complexityhttpbin.org/forms/post, the-internet.herokuapp.com/tablesMore HTML, tables, forms
Heavier pagesthe-internet.herokuapp.com, dynamic loading pagesLarger DOM, JS-dependent content

These sites tolerate modest benchmark traffic without rate limiting.


Results

Latency

Measured on the Hetzner CX22 with 100 iterations per URL, 3 warmup runs discarded.

Page typep50p95p99Notes
Static (example.com)~50 ms~80 ms~120 msCLI round-trip; engine-internal ~30-50 ms
Lightweight dynamic (books.toscrape.com)~80 ms~150 ms~200 msNetwork-bound, not CPU-bound
Medium (httpbin forms, tables)~60 ms~120 ms~180 msVaries with page size
Average page load (1,000 session test)180 msNot yet benchmarkedNot yet benchmarkedNetwork-bound across mixed URLs

The average page load of 180 ms from the 1,000-session test includes network latency to diverse targets. Engine-internal time (HTTP + parse + snapshot) is typically 30-80 ms for most pages.

Memory

Measured via RSS of the MCP server process on the Hetzner CX22.

SessionsPeak RSSPer-session overheadSwap usedNotes
10~200 MB~12 MB0Comfortable headroom
50~550 MB~9 MB0Well within 4 GB
100~950 MB~8 MB0Stable per-session cost
500~2.1 GB~8 MB0Comfortable without swap
1,000~3.1 GB~8 MB~1.8 GBSwap in use, still functional

Per-session memory converges to approximately 8 MB as session count increases. The baseline process overhead (~80-120 MB) is amortized across sessions.

Throughput

Measured on the Hetzner CX22 during the 1,000-session benchmark.

MetricValue
Time to create 1,000 sessions4.2 seconds
CPU usage during navigation bursts65-80% across both cores
Sessions per second (creation)~238
BottleneckNetwork I/O, not CPU or memory

Detailed pages/second measurements at varying concurrency levels: run bench_concurrent.sh on your own hardware — results depend heavily on network conditions. See Reproduce It Yourself.

Token Savings

This is one of Wraith's most important metrics for AI agent use cases. Every token sent to an LLM costs money, and raw HTML is extraordinarily wasteful. All numbers below are measured, not estimated — token counts use the approximation of 1 token per 4 characters.

Production pages (job boards, real-world sites)

These are the pages that matter for production workloads. Job boards, e-commerce, and content-heavy sites are where raw HTML bloat is most extreme.

PageRaw HTML tokensWraith snapshot tokensSavings
Indeed.com (job search results)464,61013,83897%
Lever (Anthropic careers page)74,30450499%
SimplyHired (job search results)101,86041499%
Greenhouse (Discord careers)17,2673,85977%
Wikipedia (main page)32,4767,41777%

Lightweight test pages

Smaller, cleaner pages see lower savings — and very small pages can actually be larger in snapshot format due to @ref metadata overhead.

PageRaw HTML tokensWraith snapshot tokensSavings
books.toscrape.com12,8233,81770%
rust-lang.org4,6502,43547%
quotes.toscrape.com2,7661,56043%
news.ycombinator.com8,7645,80533%
example.com132448-239% (snapshot larger)

The pattern: Token savings scale with page complexity. Clean static HTML sees modest savings (30-70%). Real-world production pages with CSS, JS bundles, tracking scripts, and framework boilerplate see 77-99% savings.

Financial impact at scale

At $3/M input tokens (GPT-4 class pricing), an AI agent processing 3,000 Indeed job listings:

ApproachTokens per pageTotal tokensCost
Raw HTML464,6101.39B$4,170
Wraith snapshots13,83841.5M$124
Savings1.35B$4,046 (97%)

For a Lever/Greenhouse career page scraping workflow hitting 1,000 pages:

ApproachAvg tokens/pageTotal tokensCost
Raw HTML~74,00074M$222
Wraith snapshots~500500K$1.50
Savings73.5M$220 (99%)

At scale — agents crawling, monitoring, or running agentic workflows hitting thousands of pages per day — the difference between raw HTML and Wraith snapshots is the difference between a viable product and a cost-prohibitive one.


Head-to-Head: Wraith vs Playwright (Chromium)

We ran a direct comparison on the same machine, same URLs, same conditions. No cherry-picking — every URL result is shown. The benchmark script is in the repo at benchmarks/bench_vs_playwright.js.

Test environment: Windows 10 Pro, Node v24.13.1, Playwright latest, Wraith 0.1.0 CLI mode. 5 iterations per URL. Chrome processes killed before start for clean memory baseline.

Latency Comparison (p50 ms)

URLWraithPlaywrightWinner
example.com204 ms103 msPlaywright (0.5x)
example.org245 ms97 msPlaywright (0.4x)
httpbin.org/html514 ms309 msPlaywright (0.6x)
books.toscrape.com653 ms784 msWraith (1.2x)
books.toscrape.com/mystery640 ms801 msWraith (1.3x)
quotes.toscrape.com535 ms315 msPlaywright (0.6x)
quotes.toscrape.com/page/2545 ms314 msPlaywright (0.6x)
httpbin.org/forms/post508 ms308 msPlaywright (0.6x)
httpbin.org/links/10/0508 ms306 msPlaywright (0.6x)
the-internet/tables530 ms1205 msWraith (2.3x)
the-internet (home)525 ms1297 msWraith (2.5x)
the-internet/dynamic_loading518 ms1283 msWraith (2.5x)

Analysis: Playwright wins on small/fast pages because Wraith's CLI mode pays ~200ms process startup per invocation. In MCP server mode (the normal production path), this startup cost is paid once. Wraith wins decisively on heavier pages — 2.3-2.5x faster on the-internet.herokuapp.com — where Chrome's rendering pipeline adds overhead that Wraith's native parser avoids entirely.

The honest story: For raw single-page latency, Playwright is competitive or faster on lightweight pages. Wraith's advantage is at scale — 50, 100, 1000 concurrent sessions — where per-session overhead dominates, and via MCP server mode where startup cost is amortized.

Memory Comparison (measured, not estimated)

Measured via RSS with zero Chrome processes at baseline. Playwright spawns chrome-headless-shell.exe processes.

SessionsPlaywright total RSSPer-sessionWraith total RSSPer-session
1174 MB174 MB11 MB11 MB
5433 MB87 MB57 MB11 MB
10759 MB76 MB114 MB11 MB
251,745 MB70 MB285 MB11 MB
503,401 MB68 MB570 MB11 MB

Wraith memory measured in CLI mode (per-process). MCP server mode uses ~8 MB/session with shared process overhead amortized.

At 50 sessions: Playwright uses 6x more memory than Wraith CLI, and 8.5x more than Wraith MCP server mode.

Projected to scale:

SessionsPlaywright (68 MB/session)Wraith MCP (8 MB/session)Ratio
1006.8 GB0.8 GB8.5x
50034 GB4 GB8.5x
1,00068 GB8 GB8.5x

A workload that requires a 68 GB instance with Playwright fits on a $4/mo Hetzner CX22 with Wraith.

Where Chrome is still better

Wraith does not replace Chrome for every use case. Chrome-based tools are the better choice when you need:

  • Full visual rendering — pixel-perfect screenshots of complex CSS layouts, PDF generation with visual fidelity
  • Heavy JavaScript SPAs — single-page apps with complex client-side routing and framework-specific rendering that depend on V8's full JS engine
  • Chrome-specific Web APIs — WebRTC, WebGL, Service Workers, and other APIs not implemented in Wraith's Sevro engine
  • Cross-browser visual testing — Playwright's multi-browser support for UI regression testing

For the vast majority of automation work — navigating pages, extracting content, filling forms, managing sessions, and feeding data into AI agents — Wraith handles it at a fraction of the resource cost.

Reproduce it yourself

cd wraith-browser
npm install --prefix benchmarks playwright
npx --prefix benchmarks playwright install chromium
cargo build --release
node benchmarks/bench_vs_playwright.js

Results are saved to benchmarks/results/ as JSON.


Reproduce It Yourself

All benchmark scripts are in the repository. Clone, build, and run:

git clone https://github.com/suhteevah/wraith-browser
cd wraith-browser
cargo build --release

Run individual benchmarks

export WRAITH_BIN=./target/release/wraith-browser

# Latency: 100 iterations per URL, reports p50/p95/p99
./benchmarks/bench_latency.sh

# Memory: RSS at 10, 50, 100, 500 sessions
./benchmarks/bench_memory.sh

# Concurrency: throughput at 10, 50, 100, 500, 1000 sessions
./benchmarks/bench_concurrent.sh

# Token savings: raw HTML vs Wraith snapshot per URL
./benchmarks/bench_tokens.sh

Results are written to benchmarks/results/ with timestamps.

Customize the runs

VariableDefaultDescription
WRAITH_BIN./target/release/wraith-browserPath to Wraith binary
ITERATIONS100Runs per URL (latency benchmark)
WARMUP3Warmup runs discarded
URL_FILEbenchmarks/test_urls.txtURL list to test against
SESSION_COUNTS"10 50 100 500"Session counts for memory benchmark
CONCURRENCY_LEVELS"10 50 100 500 1000"Concurrency levels for throughput benchmark
OUTPUT_FORMATsnapshotOutput format (snapshot, markdown, json)

Isolate engine performance from network

To remove network variability, run benchmarks against a local HTTP server:

echo '<html><body><h1>Benchmark</h1><p>Test content.</p></body></html>' > /tmp/bench.html
cd /tmp && python3 -m http.server 8888 &

TEST_URL=http://localhost:8888/bench.html ./benchmarks/bench_latency.sh

Hardware requirements for reproducible results

  • 4+ CPU cores (concurrent benchmarks need headroom)
  • 8 GB RAM (memory benchmark at 500 sessions needs ~6 GB)
  • Stable network connection (wired preferred over Wi-Fi)
  • Linux (scripts use /proc, ps -o rss, date +%s%N)

Close other heavy applications and run multiple times for consistent results.

On this page