Fold Engine — SQLite-WASM Browser Constraints Analysis¶

Substrate: one tenant · one tab · one SQLite-WASM instance · per-device shard · signed op-log · async relay. Prototype today (<1M ops, single device). Question: what breaks first at 50 devices / 100M total ops / production browsers including mobile?

Method note. This is a desk calculation grounded in the project's measured numbers (22,500 ops/s KernelOps.commitGroup; checkpoint bootstrap 0.90 ms vs genesis 47.70 ms; linear to 20M ops at 437 B/op; the 18 ms quorum-CAS window from W-QUORUM-CAS) plus the stated browser assumptions. Every assumption is named inline. It is not a fresh benchmark — the next step is to instrument the metrics in §5 and confirm on real devices. Companion to the Migrate & Compare paper § Disaster recovery & TCO.

This doc is reference, not a runnable lane. The mitigations below are executed from two prompt cards: tiered sharding (the OOM/genesis P0) → prompts/ERP_DATA_SHARDING_SESSION.md §QUEUED; everything else (checkpoint-first, compaction, OPFS/IDB, offline-queue, battery) → prompts/CONSTRAINT_MITIGATION_LANE.md. Both are pre-pilot hardening — run after the equivalence lane (GAP_CLOSURE_LANE.md), when a real multi-device/mobile pilot is imminent.

1. Executive summary — top 3 findings¶

Conflict probability is a non-issue at target scale; single-writer is the only HARD limit. At 50 devices × 1 op/s with a 0.1% shared op-class, same-second collision is ~0.12%, and within the real 18 ms CAS window it is ~4×10⁻⁷. You would need ~300–360 devices before same-second collision reaches 5%. The architectural single-writer-per-shard rule cannot be relaxed — but it is a feature boundary, not a scaling failure.
The genesis re-fold is the real cliff, and it is entirely a mobile problem. 100M-op genesis fold = 2.5 s desktop / 25 s mobile. Checkpoint bootstrap stays ~2–9 ms flat regardless of N (O(open-period), not O(N)). The danger is losing the checkpoint, not the op count. Mobile must never fall back to genesis on a cold start.
In-memory DB growth on mobile is the OOM you actually hit. iOS Safari kills tabs at ~500 MB total; the live DB must stay <~200 MB (sharing the tab with the WASM runtime + viewer). At 437 B/op that is ~480k open-period ops — reachable if a device stays offline across a long open period and compaction is tied only to period-close. Compaction must be size-triggered, independent of accounting periods.

2. Phase 1 — Calculated constraints¶

#	Constraint	Calculation	Result
1	Max in-memory DB before OOM	Desktop sql.js practical ≈1 GB → 2× margin = 500 MB; mobile under 500 MB tab-kill minus runtime ≈ 200 MB. At 437 B/op: 200 MiB/437 = 480k ops, 500 MiB/437 = 1.2M ops	200 MB mobile / 500 MB desktop; warn 100 MB. (Design keeps the live DB ~13 MB ≈ 31k ops — 6× under the mobile ceiling.)
2	Genesis re-fold @ 20M ops	(a) 20M×437 B = 8.74 GB (8.1 GiB) of log; (b) desktop 20M/40M ops/s = 0.50 s; (c) mobile 20M/4M = 5.0 s. General: desktop N/40M, mobile N/4M	1M: 0.025 s / 0.25 s · 10M: 0.25 s / 2.5 s · 100M: 2.5 s / 25 s
3	Checkpoint vs genesis bootstrap	Checkpoint = load 0.90 ms + fold open-period residual only (~31k ops at 13 MB). Desktop residual 31k/40M = 0.78 ms → ~1.7 ms; mobile 31k/4M = 7.8 ms → ~8.7 ms. Flat in N.	@100M: genesis 2,500 ms vs checkpoint 1.7 ms = 1,488× desktop; 25,000 ms vs 8.7 ms = 2,874× mobile
4	Concurrent-writer conflict	λ_shared = N × 1 op/s × 0.1% = 0.001N. P(≥2 in window) = 1 − e^−λ(1+λ). 50 dev, 1 s window: λ=0.05 → 0.12%. 18 ms CAS window: λ=9×10⁻⁴ → ~4×10⁻⁷. 5%/s threshold: λ≈0.36 → N≈360 devices	0.12%/s @ 50 dev; <1 ppm per CAS window. Not a near-term bottleneck
5	OPFS vs IndexedDB	Base 208 ms/1000 ops (sql.js+sha256) → IDB 4,808 ops/s. OPFS ÷2.5 = 83 ms/1000 → 12,048 ops/s. Measured batched `commitGroup` = 44 ms/1000 = 22,500 ops/s (beats both: one signature per group, not per op)	IDB 208 ms · OPFS 83 ms · batched 44 ms per 1000-op commit
6	Mobile	Battery: normal mode is bootstrap-once + 0-RTT in-memory reads → negligible; only a pathological re-fold loop costs the stated 5–10%/hr at 50% CPU. Memory: 200 MB DB ceiling (per #1). Offline queue: typical POS 100 ops/day = 43.7 KB/day; worst-case 1 op/s offline 24 h = 86,400 ops = 37.8 MB/day	Battery ✅ negligible · Memory ⚠️ 200 MB · Queue ✅ typical / ⚠️ high-rate

3. Phase 2 — Top 3 bottlenecks¶

Rank	Bottleneck	Numerical threshold	User-visible symptom	Hard / Soft
1	Mobile genesis fold	100M ops → 25 s mobile bootstrap if checkpoint absent/corrupt	~25 s white screen / spinner on cold start	SOFT — checkpoint bootstrap is ~9 ms flat; never genesis on mobile
2	Mobile in-memory OOM	DB > 200 MB (~480k open-period ops) when compaction is tied only to period-close	iOS Safari kills the tab; unsynced ops lost if not relayed	SOFT — size-triggered compaction decouples from the accounting period
3	Offline queue on high-rate device	37.8 MB/day at 1 op/s; unbounded if relay never reconnects	Storage quota exhaustion; eventual write failure	SOFT — max-queue cap + checkpoint-and-purge
—	(Single-writer per shard)	N/A — architectural	Second writer on same shard rejected	HARD — by design; a boundary, not a scaling failure

4. Phase 3 — Mitigation strategies¶

Bottleneck	Mitigation	LOC estimate	Priority
Genesis fold (mobile)	Mandatory checkpoint-first bootstrap: load signed checkpoint, fold only post-checkpoint delta; genesis only as explicit recovery, run in a Web Worker with progress UI (never block the main thread)	~120 LOC `kernel_ops.js` + ~40 worker shim	P0
In-memory OOM (mobile)	Size-triggered signed compaction: when `PRAGMA page_count` × page_size > 100 MB, fold open-period → new checkpoint, sign it (sha256 chain, prev-hash link), truncate folded ops. Independent of period-close	~200 LOC `kernel_ops.js`	P0
OPFS / IDB	VFS detection + auto-downgrade: feature-detect `navigator.storage.getDirectory` + COOP/COEP; use OPFS VFS when present (2.5×), fall back to IndexedDB VFS, log `§VFS chosen=…`. (GH-Pages has no COOP/COEP → IDB-only there)	~80 LOC bootstrap	P0
Offline queue	Bounded queue + purge policy: cap at 50 MB or 7 days; on cap, force a local checkpoint, purge folded+relayed ops, keep unrelayed ops; exponential-backoff relay retry (cap 5 min)	~140 LOC relay module	P1
Shared-resource conflict (the 0.1%)	Keep the proven quorum-CAS (`W-QUORUM-CAS`, 18 ms window, oversell=0) for stock/credit op-classes only; tag shared op-classes in the dictionary so the CAS path is taken only for them	~60 LOC (mostly tagging)	P1
Mobile battery / persistence	Battery-aware worker folding (`navigator.getBattery()` → throttle non-urgent folds when <20% & discharging) + service-worker cache of WASM/checkpoint for instant resume + `navigator.storage.persist()` to resist eviction	~90 LOC	P2

5. Phase 4 — Production-readiness scorecard¶

Constraint	Calculated limit	Current status	Mitigation	Residual risk
Max DB size	200 MB mobile / 500 MB desktop	✅ (13 MB live, 6× margin) + ✅ access-freq split implemented (warm/cold shards keep cold off the live DB; `sharding_boundaries.json` + `build/erp/shard_loader.js`, headless-witnessed `§SHARD-FREQ OVERALL=PASS`)	Size-triggered compaction	Low
Genesis fold	25 s @100M mobile	✅ checkpoint-first ENFORCED + genesis off main thread (`build/erp/bootstrap_path.js` + `genesis_worker.js`: default folds open-period only — work ∝ postOps not totalOps; mobile genesis runs in a Worker or REFUSES, never an inline freeze; `§BOOTSTRAP-PATH OVERALL=PASS`, heartbeat-proven non-blocking, falsifier bites)	Checkpoint-first + worker	Low
Checkpoint bootstrap	~9 ms flat	✅ within limit	(already strong)	Low
Writer conflict	0.12%/s @50 dev; 5% @~360 dev	✅ within limit + ✅ CAS scoped by dictionary tag (`build/erp/op_class_tags.js`: quorum-CAS routed only to tagged stock/credit classes — 0.10% of a 1000-op mix; `§OP-CLASS-TAG OVERALL=PASS`)	Quorum-CAS on tagged classes	Low
OPFS vs IDB	44 ms (batched) / 208 ms (IDB) per 1k	✅ VFS detect + IDB fallback implemented (`build/erp/vfs_detect.js`: OPFS only when API + `crossOriginIsolated`, else IDB; GH-Pages IDB-only named not silent; never silent-downgrades an OPFS-capable origin; `§VFS-DETECT OVERALL=PASS`)	VFS detect + downgrade	Med (hosting-bound: GH-Pages stays IDB)
Mobile memory	200 MB tab ceiling	✅ on-demand cold loader implemented (T2 cold `ATTACH` only on explicit demand, `DETACH` after 5-min idle — cold never resident at boot; `§SHARD-FREQ cold detachedAfterIdle=Y`, falsifier proven able to FAIL). Renderer wiring/deploy deferred (lane firewall).	Size-trigger + persist() + access-freq cold off-heap	Low
Offline queue	37.8 MB/day worst case	✅ typical + ✅ bounded queue implemented (`build/erp/offline_queue.js`: cap 50 MB/7 d → force checkpoint + purge folded&relayed, KEEP unrelayed; exp-backoff capped 5 min; `§OFFLINE-QUEUE OVERALL=PASS`, no-data-loss falsifier)	Cap + purge + backoff	Low
Single-writer	1 writer/shard	🔴 hard (immutable)	None — architectural	Hard-by-design

6. Phase 5 — Monitoring & alerting¶

Metric	How to collect	Alert threshold	Action
`db_file_size_mb`	`PRAGMA page_count` × `PRAGMA page_size`	> 100 MB	`compact()`
`quota_used_pct`	`navigator.storage.estimate()` (usage/quota)	> 70%	warn user + force purge
`open_period_ops`	counter since last checkpoint	> 300k	trigger checkpoint
`bootstrap_path`	log `genesis` vs `checkpoint` on init	any `genesis` on mobile	alert: checkpoint missing/corrupt
`vfs_backend`	record OPFS vs IDB at init	`idb` on a COOP/COEP origin	log misconfig
`offline_queue_mb`	size of the unrelayed op buffer	> 40 MB or age > 6 days	force checkpoint + purge folded
`cas_retry_rate`	failed/total CAS on shared classes	> 2%/min	back-pressure shared writes
`fold_ms_p95`	`performance.now()` around fold	> 1000 ms (mobile)	move fold to worker
`battery_pct`	`navigator.getBattery()`	< 20% & discharging	throttle non-urgent folds

7. Recommended next steps (ordered)¶

P0 — bootstrap_path metric + checkpoint-first enforcement. Cheapest, highest-leverage: guarantees mobile never eats the 25 s genesis.
P0 — size-triggered signed compaction (db_file_size_mb + open_period_ops triggers). Removes the only real mobile OOM path. Witness W-COMPACT-SIGNED maxDiff=0c (folded state == pre-compaction fold) + §FALSIFIER (skip the prev-hash link → chain breaks).
P0 — VFS detect + IDB fallback, with an explicit §VFS log; document the GH-Pages IDB-only reality so it is not mistaken for a regression.
P1 — bounded offline queue + backoff relay; witness queue cap + purge-keeps-unrelayed.
P1 — dictionary-tag the 0.1% shared op-classes so quorum-CAS fires only where needed (already proven, just needs scoping).
P2 — battery-aware worker folding + service-worker WASM/checkpoint cache for instant resume.

Honest hard limit (un-mitigable, by design): one writer per shard. Everything else is soft and within reach of the LOC above. The conflict math one might fear is the least of the worries — the genesis fold and the mobile memory ceiling bite first, and both are already ~90% handled by the existing checkpoint design; they just need enforcement and size-triggering.