The Rosetta Stone Strategy¶

Why This Exists¶

AI is blind to spatial geometry. It can parse text, generate code, and reason about logic — but it cannot see that a wall must sit on a slab, that a door must be inside a wall, or that two columns must not occupy the same space. No amount of prompt engineering fixes this: spatial correctness is a mathematical problem, not a language problem. The Rosetta Stone strategy exists to solve it deterministically — real buildings become ground truth, and every compiled output is proven against that truth with pure arithmetic. No heuristics. No tolerance tuning. No AI in the proof gates — those are pure arithmetic. If the coordinates match, the grammar is certified.

What This Is¶

35 real buildings (34 extracted + 1 generative), decomposed into reference DBs. The compiler reads a BOM describing the same building and produces output. The test: does every compiled element land at the same position as the reference?

Not the same dimensions. The same COORDINATES. Position in 3D space. A wall at (5.0, 0.0, 0.0) sized 4000x150x2700 must match a reference wall at (5.0, 0.0, 0.0) sized 4000x150x2700. Same place. Same size.

The reference DB has every answer. Read it. Match it. Not through cheating or copying — which AI often does via hallucination or drifting. The output must go through the compilation process.

Stones¶

Stone	Type	Disciplines	Status
Sample House (SH)	UK residential, 1 storey	ARC, STR, CW	ALL GREEN
FZK Haus (FK)	European residential	ARC	ALL GREEN
Duplex (DX)	US residential, 2 storey	ARC+MEP+STR	REGRESSION — severe coordinate failure (S96)
Terminal (TE)	MY institutional, 4 storey	8 disciplines	ALL GREEN

ALL stones must pass. Not 2 of 3. Not "residential only."

Element counts and gate status: see PROGRESS.md. Full coverage table: TestArchitecture.md §Rosetta Stone Coverage.

Four Verification Tiers¶

Tier 1: VOCABULARY — "Do we have the right parts?" Dimensional signature match (category, L, W, H). Library coverage 100% all stones.

Tier 2: PLACEMENT — "Are the parts in the right places?" For each compiled element, find nearest reference element of same class. Measure centroid distance. TARGET: 100% within 1mm. This is the primary metric.

Tier 3: INTEGRITY — "Does the building work?" Connections (wall sits on slab, door sits in wall) and clashes (no overlapping solids). No reference needed — checks compiled output only.

Tier 4: COMPOSITIONAL — "Are proven words in valid sentences?" For composed buildings (no reference DB). See below.

Six Gates¶

Implemented in RosettaStoneGateTest.java, permanent in Maven surefire stage 2.

Gate	What it checks
G1-COUNT	Element count: reference = compiled
G2-VOLUME	Total AABB volume: reference = compiled (±0.1%)
G3-DIGEST	Per-element spatial SHA256 (SpatialDigest)
G4-TAMPER	Self-inspection via git history + source regex
G5-PROVENANCE	Every output element traced to library (material + geometry)
G6-ISOLATION	No cross-building contamination (styles, storeys, spaces scoped)

See BOMBasedCompilation.md §6 for gate rationale and methodology.

Rules¶

POSITION IS THE METRIC. Not dimensions. Not count. Position. Fix position first. Then dimensions. Never the reverse.
SCORE IS ARBITER. If Tier 2 drops after a change, revert. All stones every time.
FIX BUILDING SHAPE BEFORE ELEMENTS. If the footprint is wrong, every element inside inherits the error.
OVER-PRODUCTION IS A BUG. Compiled has MORE than reference = splitting when it shouldn't. Ratio > 1.5 = over-production.
EVERY VALUE FROM THE LIBRARY. No hardcoded dimensions. Every value reads from an AD table with a profile column.
CATALOG, DON'T FIX. Every extracted element goes into component_library.db as a reusable, profile-tagged catalog entry.
THREE-STONE REGRESSION. If one stone drops, the fix is overfit. Revert.
Tack convention. M_BOM_Line dx/dy/dz are parent-relative per BOMBasedCompilation.md §4.

The Rosetta Dictionary¶

Compositional Verification for buildings without a reference DB (S67).

When a Rosetta Stone passes exact sameness (G1-G6 ALL GREEN), its BOM becomes a dictionary entry. Every product, tack offset, and verb pattern is a proven word. A composed building is a sentence built from proven words.

Verification changes for composed buildings¶

	Extracted building	Composed building
Question	Does output == reference?	Is each fragment consistent with its source?
Needs	Full reference DB	Provenance + spatial invariants
Gate	G1-G6	G7-COMPOSITION

Four verification steps¶

PROVENANCE — trace each C_OrderLine to its source stone via family_ref → M_Product → source BOM
FRAGMENT FIDELITY — tack offsets, product dimensions, and verb patterns match the source stone's proven BOM
SPATIAL INVARIANTS — EYES proofs: roof covers structure, FP below ceiling, ELEC inside rooms, no escapees
CONTAINMENT — every element inside its spatial slot (M_BOM_Line AABB via dx/dy/dz), recursive

Fragment types and verifiers¶

Fragment type	Source	What to check
Proven (from Rosetta Stone)	Stone's BOM.db	Tack offset match, product dimensions, verb pattern
Rule-driven (FP/ELEC/ACMV)	ERP.db AD tables	Product exists, placement satisfies spatial rule, spacing correct
User-modified (ASI override)	output.db	EYES spatial invariants hold post-mutation
Freehand (viewport drawing)	output.db + M_BOM_Line dx/dy/dz	Containment, adjacency, no clashes

Witnesses¶

W-COMP-PROV-1 — all fragments trace to a certified source
W-COMP-FRAG-1 — all fragments match their source's proven offsets
W-COMP-SPAT-1 — all spatial invariants hold
W-COMP-CONT-1 — all elements contained within their spatial slots

Compiled Construction vs Revit¶

Revit is a canvas — click, place, adjust, repeat. Every element is a manual act. This project is a compiler — write intent, the compiler produces geometry that has a language of its own. We first defined it as VERBS in BIM COBOL.

	Revit (authoring)	BIM COBOL (compiling)
Act	Place one element at a time	Formula generates thousands
Compliance	Checked AFTER (Solibri)	Enforced DURING compilation
BOM	Extracted AFTER	Generated WITH geometry
Reproducibility	Different architects → different files	Same source → identical output
Scale	50K elements = 50K manual acts	50K elements = ~50 formulas

Strategic position: BIM COBOL does not replace Revit. It replaces the manual repetitive work inside Revit — the 95.8% of elements that follow patterns. The target: high-repetition, rule-governed projects (terminals, mass housing, infrastructure).

See StrategicIndustryPositioning.md for the full competitive analysis.

Why Rosetta Stones Exist — The Training Set Thesis¶

The Rosetta Stones are not the product. They are the training set. Each proven compilation teaches the compiler a default path — the known-good route from intent to 3D output.

Once the default path is proven, everything else rides on it:

Editor rides the default path — Bonsai starts from a known-good compilation; users make macro-level changes (move wall, add bedroom, swap to timber frame)
BOM dictionary grows with each stone — every new building type adds proven resolutions to the dictionary
Compile-once-copy-many — proven arrangements become single lookups

Prior Art — Why Not Parametric BIM?¶

Dynamo, Grasshopper, and OpenBIM scripting are parametric — they generate geometry from parameters via visual programming. The BIM Compiler is not parametric. It is compilative: a BOM recipe is compiled into verified coordinates, the same way an ERP system explodes a manufacturing BOM into work orders. The distinction:

	Parametric BIM	BIM Compiler
Input	Parameters + visual graph	BOM recipe (M_BOM rows)
Output	Geometry (no proof)	Geometry + arithmetic proof chain
Verification	Manual inspection	6 mathematical gates (automated)
Reproducibility	Depends on plugin version + graph state	Deterministic — same BOM + library = same output
ERP integration	None (geometry tool)	Native — C_Order, M_Product, M_BOM

Speckle solves data transport (BIM ↔ cloud). It does not compile or verify. IFC.js and IfcOpenShell parse IFC files. They do not produce geometry from BOMs. The BIM Compiler is the only tool that takes a 1D BOM and produces verified 3D coordinates with a machine-checkable proof chain.

See Strategic Industry Positioning for the full competitive analysis and market positioning.

Why Nobody Else Can Self-Verify¶

The industry doesn't decompose real buildings into reusable BOMs. They either author from scratch (Revit), scan and compare snapshots (Navisworks), or classify finished models (Solibri). None of them can answer: "is this compiled wall in the same position as the extracted source wall, and can you prove it with a number?"

The Rosetta Stone approach enables self-verification because the round-trip is inherent to the architecture:

IFC file → EXTRACT → BOM dictionary (tack offsets + IFC GUIDs)
                              ↓
              COMPILE ← BOM dictionary
                              ↓
           output.db → VERIFY against extraction source

Each compiled element carries its IFC GUID through the BOM chain (m_bom_line_ma). At compile time, GEO debug mode (-Dbim.geo.debug=true) compares the compiled position against the extraction source for that GUID and emits MATCH or DRIFT with a millimetre delta per axis. No human arithmetic — the log is the verdict. See LMP §7.

Why this is impossible without Rosetta Stones:

No BOM decomposition → no tack offsets → no recompilation. Revit stores geometry directly. There's no BOM to compile from, so there's no round-trip to verify.
No IFC GUID chain → no element-level traceability. Digital twin platforms carry GUIDs for lifecycle tracking, but they don't recompile geometry from BOMs. The GUID is a label, not a provenance proof.
No extraction source → no comparison target. Solibri checks rules on a model. It doesn't know what the model should look like — only what rules it should satisfy. The Rosetta Stone IS the comparison target.

The BOM dictionary is the learned relationship. The GEO proof is the evidence that the learning was preserved. Together they make this the only BIM tool that can compile a building from a recipe and prove every element landed where the source building says it should.

Cross-Domain Precedent — The Folding Problem¶

Construction is not the first domain to face this challenge. Two other fields solved the same abstract problem: inferring 3D spatial structure from a 1D specification by learning from solved examples.

Protein science (the closest cousin)¶

The protein folding problem consumed 50 years of research: given a sequence of amino acids (1D), predict the 3D folded structure. DeepMind's AlphaFold solved it in 2020 by learning from the Protein Data Bank (PDB) — 200,000 experimentally solved structures that served as ground truth.

	Protein Science	BIM Compiler
1D input	Amino acid sequence	Construction Order (C_Order → C_OrderLine)
3D output	Folded protein structure	Compiled building (output.db)
Ground truth database	PDB (200K solved structures)	Rosetta Stones (35 buildings, growing)
What's learned	Bond angles, torsion, spatial motifs	Tack offsets, verb patterns, BOM recipes
Compilation step	Homology modelling / AlphaFold inference	BOM walk (PlacementCollectorVisitor)
Verification	RMSD against crystal structure / energy minimisation	GEO MATCH/DRIFT + G1-G6 gates
Reuse unit	Protein domain (reusable fold motif)	BOM assembly (reusable spatial recipe)

Template-based modelling in protein science works because spatial relationships transfer: a helix-turn-helix motif in one protein predicts the same fold in another protein with a similar sequence. Our BOM assemblies work the same way — a BED_SET tack arrangement from SH compiles correctly in any building with a bedroom of compatible dimensions.

Robotics (the mechanical cousin)¶

URDF (Unified Robot Description Format) decomposes a robot into links and joints with parent-child transforms. Forward kinematics accumulates these transforms through the chain to compute world positions — mathematically identical to PlacementCollectorVisitor's anchor stack. Robot calibration verifies computed positions against sensor readings — their version of GEO MATCH/DRIFT.

Who does all four steps?¶

Domain	Decompose real thing?	Store spatial recipe?	Recompile?	Self-verify?
VLSI	—	Yes	Yes	Yes (DRC)
Automotive	—	Yes	Yes	Partial
Game engines	—	Yes	Yes	—
Robotics	Yes	Yes	Yes	Yes
Protein/Rosetta	Yes	Yes	Yes	Yes
Shipbuilding	—	Yes	Yes	—
BIM Compiler	Yes	Yes	Yes	Yes

Only robotics, protein science, and this compiler do all four. Construction is the last major engineering domain to gain a compilation model that learns from real structures.

The growth dynamic¶

The PDB grew from a few hundred structures to 200,000 over 50 years. Each solved structure made the next prediction more accurate — because spatial motifs transfer. Our Rosetta Stone library is at 35 buildings. The same growth dynamic applies: each new building type adds proven tack arrangements, verb patterns, and product resolutions to the dictionary. A residential BED_SET, a commercial FP_RISER, an infrastructure bridge deck — each is a solved spatial motif that transfers to the next building of that type.

AlphaFold's breakthrough was proving that learned spatial relationships generalise. The BIM Compiler's thesis is the same: the spatial relationships in 35 real buildings, captured as BOM tack offsets, are sufficient to compile any building of similar type — and the GEO proof chain verifies it with millimetre precision.

Full historical record: Terminal decomposition phases (TE-1 through TE-8), score history, benchmark baselines, known gaps (resolved), testing code description, and synthetic Rosetta Stone details are preserved in archive/TheRosettaStoneStrategy_full.md.