Fold Engine — SQLite-WASM Browser Constraints Analysis
Substrate: one tenant · one tab · one SQLite-WASM instance · per-device shard · signed op-log · async relay. Prototype today (<1M ops, single device). Question: what breaks first at 50 devices / 100M total ops / production browsers including mobile?
Method note. This is a desk calculation grounded in the project's measured numbers (22,500 ops/s KernelOps.commitGroup; checkpoint bootstrap 0.90 ms vs genesis 47.70 ms; linear to 20M ops at 437 B/op; the 18 ms quorum-CAS window from W-QUORUM-CAS) plus the stated browser assumptions. Every assumption is named inline. It is not a fresh benchmark — the next step is to instrument the metrics in §5 and confirm on real devices. Companion to the Migrate & Compare paper § Disaster recovery & TCO.
This doc is reference, not a runnable lane. The mitigations below are executed from two prompt cards: tiered sharding (the OOM/genesis P0) → prompts/ERP_DATA_SHARDING_SESSION.md §QUEUED; everything else (checkpoint-first, compaction, OPFS/IDB, offline-queue, battery) → prompts/CONSTRAINT_MITIGATION_LANE.md. Both are pre-pilot hardening — run after the equivalence lane (GAP_CLOSURE_LANE.md), when a real multi-device/mobile pilot is imminent.
1. Executive summary — top 3 findings
- Conflict probability is a non-issue at target scale; single-writer is the only HARD limit. At 50 devices × 1 op/s with a 0.1% shared op-class, same-second collision is ~0.12%, and within the real 18 ms CAS window it is ~4×10⁻⁷. You would need ~300–360 devices before same-second collision reaches 5%. The architectural single-writer-per-shard rule cannot be relaxed — but it is a feature boundary, not a scaling failure.
- The genesis re-fold is the real cliff, and it is entirely a mobile problem. 100M-op genesis fold = 2.5 s desktop / 25 s mobile. Checkpoint bootstrap stays ~2–9 ms flat regardless of N (O(open-period), not O(N)). The danger is losing the checkpoint, not the op count. Mobile must never fall back to genesis on a cold start.
- In-memory DB growth on mobile is the OOM you actually hit. iOS Safari kills tabs at ~500 MB total; the live DB must stay <~200 MB (sharing the tab with the WASM runtime + viewer). At 437 B/op that is ~480k open-period ops — reachable if a device stays offline across a long open period and compaction is tied only to period-close. Compaction must be size-triggered, independent of accounting periods.
2. Phase 1 — Calculated constraints
| # |
Constraint |
Calculation |
Result |
| 1 |
Max in-memory DB before OOM |
Desktop sql.js practical ≈1 GB → 2× margin = 500 MB; mobile under 500 MB tab-kill minus runtime ≈ 200 MB. At 437 B/op: 200 MiB/437 = 480k ops, 500 MiB/437 = 1.2M ops |
200 MB mobile / 500 MB desktop; warn 100 MB. (Design keeps the live DB ~13 MB ≈ 31k ops — 6× under the mobile ceiling.) |
| 2 |
Genesis re-fold @ 20M ops |
(a) 20M×437 B = 8.74 GB (8.1 GiB) of log; (b) desktop 20M/40M ops/s = 0.50 s; (c) mobile 20M/4M = 5.0 s. General: desktop N/40M, mobile N/4M |
1M: 0.025 s / 0.25 s · 10M: 0.25 s / 2.5 s · 100M: 2.5 s / 25 s |
| 3 |
Checkpoint vs genesis bootstrap |
Checkpoint = load 0.90 ms + fold open-period residual only (~31k ops at 13 MB). Desktop residual 31k/40M = 0.78 ms → ~1.7 ms; mobile 31k/4M = 7.8 ms → ~8.7 ms. Flat in N. |
@100M: genesis 2,500 ms vs checkpoint 1.7 ms = 1,488× desktop; 25,000 ms vs 8.7 ms = 2,874× mobile |
| 4 |
Concurrent-writer conflict |
λ_shared = N × 1 op/s × 0.1% = 0.001N. P(≥2 in window) = 1 − e^−λ(1+λ). 50 dev, 1 s window: λ=0.05 → 0.12%. 18 ms CAS window: λ=9×10⁻⁴ → ~4×10⁻⁷. 5%/s threshold: λ≈0.36 → N≈360 devices |
0.12%/s @ 50 dev; <1 ppm per CAS window. Not a near-term bottleneck |
| 5 |
OPFS vs IndexedDB |
Base 208 ms/1000 ops (sql.js+sha256) → IDB 4,808 ops/s. OPFS ÷2.5 = 83 ms/1000 → 12,048 ops/s. Measured batched commitGroup = 44 ms/1000 = 22,500 ops/s (beats both: one signature per group, not per op) |
IDB 208 ms · OPFS 83 ms · batched 44 ms per 1000-op commit |
| 6 |
Mobile |
Battery: normal mode is bootstrap-once + 0-RTT in-memory reads → negligible; only a pathological re-fold loop costs the stated 5–10%/hr at 50% CPU. Memory: 200 MB DB ceiling (per #1). Offline queue: typical POS 100 ops/day = 43.7 KB/day; worst-case 1 op/s offline 24 h = 86,400 ops = 37.8 MB/day |
Battery ✅ negligible · Memory ⚠️ 200 MB · Queue ✅ typical / ⚠️ high-rate |
3. Phase 2 — Top 3 bottlenecks
| Rank |
Bottleneck |
Numerical threshold |
User-visible symptom |
Hard / Soft |
| 1 |
Mobile genesis fold |
100M ops → 25 s mobile bootstrap if checkpoint absent/corrupt |
~25 s white screen / spinner on cold start |
SOFT — checkpoint bootstrap is ~9 ms flat; never genesis on mobile |
| 2 |
Mobile in-memory OOM |
DB > 200 MB (~480k open-period ops) when compaction is tied only to period-close |
iOS Safari kills the tab; unsynced ops lost if not relayed |
SOFT — size-triggered compaction decouples from the accounting period |
| 3 |
Offline queue on high-rate device |
37.8 MB/day at 1 op/s; unbounded if relay never reconnects |
Storage quota exhaustion; eventual write failure |
SOFT — max-queue cap + checkpoint-and-purge |
| — |
(Single-writer per shard) |
N/A — architectural |
Second writer on same shard rejected |
HARD — by design; a boundary, not a scaling failure |
4. Phase 3 — Mitigation strategies
| Bottleneck |
Mitigation |
LOC estimate |
Priority |
| Genesis fold (mobile) |
Mandatory checkpoint-first bootstrap: load signed checkpoint, fold only post-checkpoint delta; genesis only as explicit recovery, run in a Web Worker with progress UI (never block the main thread) |
~120 LOC kernel_ops.js + ~40 worker shim |
P0 |
| In-memory OOM (mobile) |
Size-triggered signed compaction: when PRAGMA page_count × page_size > 100 MB, fold open-period → new checkpoint, sign it (sha256 chain, prev-hash link), truncate folded ops. Independent of period-close |
~200 LOC kernel_ops.js |
P0 |
| OPFS / IDB |
VFS detection + auto-downgrade: feature-detect navigator.storage.getDirectory + COOP/COEP; use OPFS VFS when present (2.5×), fall back to IndexedDB VFS, log §VFS chosen=…. (GH-Pages has no COOP/COEP → IDB-only there) |
~80 LOC bootstrap |
P0 |
| Offline queue |
Bounded queue + purge policy: cap at 50 MB or 7 days; on cap, force a local checkpoint, purge folded+relayed ops, keep unrelayed ops; exponential-backoff relay retry (cap 5 min) |
~140 LOC relay module |
P1 |
| Shared-resource conflict (the 0.1%) |
Keep the proven quorum-CAS (W-QUORUM-CAS, 18 ms window, oversell=0) for stock/credit op-classes only; tag shared op-classes in the dictionary so the CAS path is taken only for them |
~60 LOC (mostly tagging) |
P1 |
| Mobile battery / persistence |
Battery-aware worker folding (navigator.getBattery() → throttle non-urgent folds when <20% & discharging) + service-worker cache of WASM/checkpoint for instant resume + navigator.storage.persist() to resist eviction |
~90 LOC |
P2 |
5. Phase 4 — Production-readiness scorecard
| Constraint |
Calculated limit |
Current status |
Mitigation |
Residual risk |
| Max DB size |
200 MB mobile / 500 MB desktop |
✅ (13 MB live, 6× margin) + ✅ access-freq split implemented (warm/cold shards keep cold off the live DB; sharding_boundaries.json + build/erp/shard_loader.js, headless-witnessed §SHARD-FREQ OVERALL=PASS) |
Size-triggered compaction |
Low |
| Genesis fold |
25 s @100M mobile |
✅ checkpoint-first ENFORCED + genesis off main thread (build/erp/bootstrap_path.js + genesis_worker.js: default folds open-period only — work ∝ postOps not totalOps; mobile genesis runs in a Worker or REFUSES, never an inline freeze; §BOOTSTRAP-PATH OVERALL=PASS, heartbeat-proven non-blocking, falsifier bites) |
Checkpoint-first + worker |
Low |
| Checkpoint bootstrap |
~9 ms flat |
✅ within limit |
(already strong) |
Low |
| Writer conflict |
0.12%/s @50 dev; 5% @~360 dev |
✅ within limit + ✅ CAS scoped by dictionary tag (build/erp/op_class_tags.js: quorum-CAS routed only to tagged stock/credit classes — 0.10% of a 1000-op mix; §OP-CLASS-TAG OVERALL=PASS) |
Quorum-CAS on tagged classes |
Low |
| OPFS vs IDB |
44 ms (batched) / 208 ms (IDB) per 1k |
✅ VFS detect + IDB fallback implemented (build/erp/vfs_detect.js: OPFS only when API + crossOriginIsolated, else IDB; GH-Pages IDB-only named not silent; never silent-downgrades an OPFS-capable origin; §VFS-DETECT OVERALL=PASS) |
VFS detect + downgrade |
Med (hosting-bound: GH-Pages stays IDB) |
| Mobile memory |
200 MB tab ceiling |
✅ on-demand cold loader implemented (T2 cold ATTACH only on explicit demand, DETACH after 5-min idle — cold never resident at boot; §SHARD-FREQ cold detachedAfterIdle=Y, falsifier proven able to FAIL). Renderer wiring/deploy deferred (lane firewall). |
Size-trigger + persist() + access-freq cold off-heap |
Low |
| Offline queue |
37.8 MB/day worst case |
✅ typical + ✅ bounded queue implemented (build/erp/offline_queue.js: cap 50 MB/7 d → force checkpoint + purge folded&relayed, KEEP unrelayed; exp-backoff capped 5 min; §OFFLINE-QUEUE OVERALL=PASS, no-data-loss falsifier) |
Cap + purge + backoff |
Low |
| Single-writer |
1 writer/shard |
🔴 hard (immutable) |
None — architectural |
Hard-by-design |
6. Phase 5 — Monitoring & alerting
| Metric |
How to collect |
Alert threshold |
Action |
db_file_size_mb |
PRAGMA page_count × PRAGMA page_size |
> 100 MB |
compact() |
quota_used_pct |
navigator.storage.estimate() (usage/quota) |
> 70% |
warn user + force purge |
open_period_ops |
counter since last checkpoint |
> 300k |
trigger checkpoint |
bootstrap_path |
log genesis vs checkpoint on init |
any genesis on mobile |
alert: checkpoint missing/corrupt |
vfs_backend |
record OPFS vs IDB at init |
idb on a COOP/COEP origin |
log misconfig |
offline_queue_mb |
size of the unrelayed op buffer |
> 40 MB or age > 6 days |
force checkpoint + purge folded |
cas_retry_rate |
failed/total CAS on shared classes |
> 2%/min |
back-pressure shared writes |
fold_ms_p95 |
performance.now() around fold |
> 1000 ms (mobile) |
move fold to worker |
battery_pct |
navigator.getBattery() |
< 20% & discharging |
throttle non-urgent folds |
7. Recommended next steps (ordered)
- P0 —
bootstrap_path metric + checkpoint-first enforcement. Cheapest, highest-leverage: guarantees mobile never eats the 25 s genesis.
- P0 — size-triggered signed compaction (
db_file_size_mb + open_period_ops triggers). Removes the only real mobile OOM path. Witness W-COMPACT-SIGNED maxDiff=0c (folded state == pre-compaction fold) + §FALSIFIER (skip the prev-hash link → chain breaks).
- P0 — VFS detect + IDB fallback, with an explicit
§VFS log; document the GH-Pages IDB-only reality so it is not mistaken for a regression.
- P1 — bounded offline queue + backoff relay; witness queue cap + purge-keeps-unrelayed.
- P1 — dictionary-tag the 0.1% shared op-classes so quorum-CAS fires only where needed (already proven, just needs scoping).
- P2 — battery-aware worker folding + service-worker WASM/checkpoint cache for instant resume.
Honest hard limit (un-mitigable, by design): one writer per shard. Everything else is soft and within reach of the LOC above. The conflict math one might fear is the least of the worries — the genesis fold and the mobile memory ceiling bite first, and both are already ~90% handled by the existing checkpoint design; they just need enforcement and size-triggering.