Skip to content

Terminal Recomposition — SJTII_Terminal Forensics

Foundation: BBC · DATA_MODEL · BIM_COBOL · MANIFESTO · TestArchitecture

48,428 elements across 8 disciplines — extraction proven, BOM persisted (S100-p69). IFCtoBOM pipeline complete: flat BOM (8 BOMs, 1,522 lines, 48,428 instances, 95.9× factorization). QA all PASS. Compile-path blockers identified — see §Compilation Status below.

CTFL Review Status (session 31-34, 2026-03-19)

Last reviewed: 2026-03-19 session 34 — CTFL static review + SRS gap analysis. Session 31: 10 defects found and fixed (D1-D10). Session 34: F1-F4 quick wins resolved, 4 SRS docs updated (12 new spec sections). Action: All numbers canonical. Two line counts: 1,522 flat extraction lines (IFCtoBOM output), 1,131 factored recipe lines (post-CLUSTER verb optimization). Banner reports extraction; §BOM Catalog reports factored.

Resolved issues (session 34):

ID Gate What Fix Applied Status
F1 G3-DIGEST Seal check Seal already INTACT — changed files not in sealed set DONE
F2 G5-PROVENANCE IfcRampFlight 6 vertices G5 Check 3 relaxed: vertex_count >= 4 (not 8). Ramp is a triangular prism — valid shape. Check 6 (no GEO_ prefix) is the real parametric fallback guard. DONE
F3 C9 axis 7 tie-breaking instabilities VerbDetector sort: (X,Y,Z) → (X,Y,Z,W,D,H) DONE
F4 DemoHouseTest 6 errors (empty DM_BOM.db) Assumptions.assumeTrue() skip guard DONE

Spec alignment (2026-03-18, session 18): All TE BOM offsets, tack convention, BUFFER, and compilation modes must conform to BOMBasedCompilation.md §3-§4 (the governing spec). See §Tack I/O and §Recurrence sections below. Code changes spec in ACTION_ROADMAP.md §Pre-Code Specs.

Extracted from docs/TheRosettaStoneStrategy.md §TERMINAL RECOMPOSITION (2026-02-28). Updated 2026-03-19 with CTFL review status and per-instance CLUSTER dimensions.

Building Identity

Property Value
Stone 3 of 3 (largest)
Name SJTII_Terminal (Sultan Johor Terminal II)
IFC version IFC2x3 (federated from 9 discipline models)
Country Malaysia
Type Airport terminal, 4+ storeys, institutional
Elements 48,428 (51,092 - 4 IfcSensor - 2,660 rebar — both Federation addons)
Disciplines 8 (ARC, STR, FP, ACMV, CW, ELEC, SP, LPG) — REB removed (Bonsai addon)
M_Product_Category CO (Commercial)
C_DocType_ID CO_TE
Reference DB DAGCompiler/lib/input/Terminal_Extracted.db

Why Terminal Is Different From SH/DX

SH/DX are residential. Their BOM tree shape (self-describing per BBC.md §1) is:

BUILDING → FLOOR → ROOM → SET → LEAF

Terminal is institutional. There are no "rooms" in the residential sense. Instead there are ZONES: departure hall, check-in counters, boarding gates, retail areas, mechanical rooms, roof structure. The BOM tree shape is:

BUILDING → STOREY → DISCIPLINE → ASSEMBLY → LEAF
The tree walker (getParentBOM()/getChildren()) handles both shapes — no vocabulary or level labels needed.

Also: SH/DX had 1-2 IFC source files. Terminal was federated from 9 discipline-specific models. The discipline boundaries are authoritative — they came from separate consultant firms.

Element Inventory by Discipline

Discipline Count Dominant Classes
ARC 34,724 IfcPlate(33,324) Wall(330) Window(236) Furniture(176)
FP 6,863 PipeFitting(3,146) PipeSegment(2,672) FireSuppression(909) Alarm(80)
~~REB~~ ~~2,660~~ ~~ReinforcingBar(2,660)~~ — REMOVED (Bonsai Python addon, not construction BOM)
ACMV 1,621 DuctFitting(713) DuctSegment(568) AirTerminal(289) Proxy(51)
CW 1,431 PipeFitting(638) PipeSegment(619) FlowTerminal(106) Valve(57)
STR 1,429 Slab(614) Beam(432) Member(312) Column(68) Wall(3)
ELEC 1,172 LightFixture(814) Proxy(339) Appliance(19)
SP 979 PipeSegment(455) PipeFitting(372) FlowTerminal(150) Valve(2)
LPG 209 PipeFitting(87) PipeSegment(75) Valve(47)
Total (extracted) 51,088 All 9 original disciplines
Total (active) 48,428 After REB (2,660) removal — pipeline baseline

ARC dominates at 72% of active elements — almost entirely IfcPlate roof tiles (33,324 = 69% of active).

Storey Structure

Storey Active Elements Notes
Roof (RF) 35,428 Mostly IfcPlate metal deck tiles
Ground Floor (GF) 3,513 Check-in hall, main MEP
Level 1 (L1) 2,070
Level 2 (L2) 2,609
Level 3 (L3) 1,798
Level 4 (L4) 2,307
Foundation (FN) 703 Structural slabs, subgrade MEP
Total 48,428 After REB/IfcSensor removal

RESOLVED (TE-1): Z-centroid band assignment normalised all storeys into 7 bands. Counts measured from BOM hierarchy (pipeline QA log, session 30).

Factorization — The Scale Reduction

Discipline Elements Unique Types Factor Active?
ARC 34,724 519 67× yes
FP 6,863 1,093 yes
ACMV 1,621 543 yes
STR 1,429 555 yes
ELEC 1,172 401 yes
CW 1,431 683 yes
SP 979 566 yes
LPG 209 yes
~~REB~~ ~~2,660~~ ~~73~~ ~~36×~~ REMOVED
Active total 48,428 505 95.9× reuse

"Unique Types" = distinct dimensional signatures (dx × dy × dz rounded to mm). Each unique type becomes one M_Product row. Each BOM line references a type with qty > 1 where instances repeat — the factored form.

The Roof Deck — TILE Verb

33,324 IfcPlate elements tile the roof surface in a regular grid.

Measured from Terminal_Extracted.db: - Y-step: 150mm (plate depth, edge-to-edge — 3,774 of 3,819 pairs exact) - X-step: 495mm (plate width, edge-to-edge — 35 of 43 pairs exact)

9 Z-bands of roof panels (Z = 18m to 28m). Each band is a horizontal surface at a different height. Within each band, plates tile in a regular 2D grid:

TILE SURFACE "ROOF_DECK_Z19" WITH "PLATE_500x150x106"
    PANEL "west"    ORIGIN (92.49, -42.16, 19.0)  GRID 15 x 294  STEP (495mm, 150mm)
    PANEL "central" ORIGIN (122.63, -42.16, 19.0) GRID 14 x 174  STEP (495mm, 150mm)
    PANEL "east"    ORIGIN (141.74, -42.16, 19.0) GRID 15 x 34   STEP (495mm, 150mm)
END-TILE

~20 TILE statements describe the entire roof (33,324 elements from ~20 formulas).

Formula Coverage — BIM COBOL Verb Patterns

Predicted (pre-implementation analysis) vs Actual (pipeline QA, session 30):

Formula Pattern BIM COBOL Verb Predicted Actual Verb Actual Instances Status
TILE (2D grid) TILE SURFACE 33,324 TILE 12 LIVE (0.0m fidelity)
PATH (1D route) ROUTE 9,345 ROUTE 18 LIVE (0.32m max)
GRID (structural) FRAME 590 FRAME 78 LIVE (1.4mm max — S51 LBD fix)
Semi-regular grid CLUSTER CLUSTER 47,607 LIVE (29.1m max, 3.7m avg — approximate)
Irregular (flat) manual placement 2,123 flat 770 LIVE (exact)
~~ARRAY (rebar)~~ ~~ARRAY~~ ~~2,660~~ REMOVED (REB excluded)
Total 48,485 48,428 active

Key finding: CLUSTER absorbed the bulk of elements that were predicted for TILE/ROUTE/WIRE/FRAME. CLUSTER uses offset-table grouping (semi-regular, ±10% tolerance), not exact grid formulas. This is the root cause of G2-VOLUME 13.71% drift — CLUSTER's average 3.7m positional error across 47,607 instances.

Path forward: Promote CLUSTER groups to exact verbs (TILE, ROUTE, FRAME) where the underlying pattern is truly regular. Non-uniform groups stay CLUSTER.

Predicted vs Actual BOM Hierarchy

Predicted (pre-implementation, 5-level with assembly groupings):

Level 0: BUILDING_TE_STD (BUILDING, M_Product_Category=CO)
├── Level 1: TERMINAL_TE_GF (FLOOR) — Ground, ~4166 elements
├── ...
└── Each FLOOR contains:
    ├── Level 2: ARC_TE_LXX (DISCIPLINE)
    │   ├── Level 3: WALL_SET — ~80 walls/storey
    │   ├── Level 3: OPENING_SET — doors+windows hosted on walls
    │   ├── Level 3: FURNITURE_SET — zone furniture     ← predicted assembly groupings
    │   └── Level 3: MISC_ARC — coverings, railings, stairs
    ├── Level 2: STR_TE_LXX (DISCIPLINE)
    │   ├── Level 3: FRAME — beams + columns + members
    │   └── Level 3: SLAB_SET — structural slabs
    ├── Level 2: FP_TE_LXX (DISCIPLINE)
    │   ├── Level 3: FP_PIPE_RUN — pipe segments + fittings
    │   └── Level 3: SPRINKLER_SET — fire suppression terminals
    ├── Level 2: ACMV_TE_LXX (DISCIPLINE)
    │   ├── Level 3: DUCT_RUN — duct segments + fittings
    │   └── Level 3: AIR_TERM_SET — air terminals
    ├── Level 2: ELEC_TE_LXX (DISCIPLINE)
    │   ├── Level 3: LIGHTING_SET — light fixtures (qty factorized)
    │   └── Level 3: EQUIP — proxies + appliances
    ├── Level 2: CW_TE_LXX / SP_TE_LXX / LPG_TE_LXX
    │   └── (same pattern: pipe runs + fixture sets)
    └── ...

Actual (pipeline QA, session 30 — 3-level flat, no assembly groupings):

Level 0: BUILDING_TE_STD (1 BOM, origin 84.6, -51.2, -30.7)
├── Level 1: TE Foundation  (FLOOR, FN)  — 6 discipline SETs,   703 instances
├── Level 1: TE Ground Floor (FLOOR, GF) — 8 discipline SETs, 3,513 instances
├── Level 1: TE Level 1     (FLOOR, L1)  — 8 discipline SETs, 2,070 instances
├── Level 1: TE Level 2     (FLOOR, L2)  — 7 discipline SETs, 2,609 instances
├── Level 1: TE Level 3     (FLOOR, L3)  — 7 discipline SETs, 1,798 instances
├── Level 1: TE Level 4     (FLOOR, L4)  — 7 discipline SETs, 2,307 instances
└── Level 1: TE Roof        (FLOOR, RF)  — 7 discipline SETs, 35,428 instances
                                                               ------
                                                               48,428 total

D8 gap: Predicted Level 3 assembly groupings (WALL_SET, OPENING_SET, etc.) were NOT implemented. CO path uses BUILDING→FLOOR→DISCIPLINE(flat leaves). No scope-space decomposition within disciplines. The predicted 5-level hierarchy collapsed to 3 effective levels. This is correct for extraction (positions come from IFC, no need to group by assembly), but generative mode will need the assembly sub-groupings. See BBC.md §2.1.1 for the decomposition layers that would add Level 3.

BOM Factorization — DONE (37:1 Compression)

Status: FACTORED. 48,485 instances → 1,131 recipe lines (sessions 8-11, CLUSTER optimisation). 4 verbs: TILE/ROUTE/FRAME/CLUSTER. 97.6% of BOM encoded by 361 verb formulas + 770 flat lines. See BIM_COBOL.md §19 for detection algorithms. (History: 1,442 lines pre-CLUSTER → 1,297 post-SPRAY → 1,131 post-CLUSTER rename.)

Current Sizings (measured 2026-03-18)

BOM Catalog (TE_BOM.db) — 58 BOMs, 1,131 lines (factored via CLUSTER)

Table Before factorization After factorization Notes
m_bom (tree nodes) 58 58 1 root + 7 floor-level + 50 leaf-group BOMs
m_bom_line (edges) 48,485 1,131 361 verb lines + 770 flat lines
M_Product 563 563 505 catalog + 58 assembly stubs

Verb Factorization Breakdown

Verb Recipe lines Expanded instances Ratio What
CLUSTER 354 47,607 134:1 MEP semi-regular grids (sprinklers, pipes, ducts, lights)
TILE 3 12 4:1 2D uniform grid (roof plate panels)
FRAME 2 78 39:1 Grid intersections (structural bays)
ROUTE 2 18 9:1 Axis-aligned uniform-step runs
Verb subtotal 361 47,715 132:1
Flat (no verb) 770 770 1:1 Irregular placements (furniture, proxies, unique fittings)
Total 1,131 48,485 42.8:1

Verb savings: Without verbs, TE_BOM.db would need 48,485 per-instance lines. Verbs eliminate 47,354 lines — 97.6% of the BOM is encoded by 361 verb formulas. CLUSTER alone saves 47,253 lines (the bulk). The 770 flat lines are the irreducible core: unique placements where no pattern exists (furniture, proxies, one-off fittings).

Note: TILE shows only 3/12 because CLUSTER absorbed most of the 33,324 roof plates. The original TILE prediction (~20 formulas for 33K plates) was superseded by CLUSTER's offset-table approach which is more general.

Top 10 Discipline BOMs by Instance Count

BOM Category Recipe lines Instances Ratio
TE Roof ARC ARC 37 33,417 903:1
TE Roof FP FP 16 1,652 103:1
TE Level 1 FP FP 10 1,185 119:1
TE Ground Floor FP FP 21 1,132 54:1
TE Level 4 FP FP 13 1,072 82:1
TE Level 2 FP FP 23 1,064 46:1
TE Level 3 FP FP 19 754 40:1
TE Ground Floor CW CW 37 754 20:1
TE Ground Floor ARC ARC 116 584 5:1
TE Level 4 ACMV ACMV 21 471 22:1

Roof ARC dominates: 37 recipe lines → 33,417 instances (903:1) = the metal deck tile panels. This is the single most compressed BOM in the system.

FP (fire protection) is the most recurring: 7 floor BOMs, each with high compression ratios — sprinkler grids and pipe runs are regular patterns.

Ground Floor ARC is the least compressed: 116 lines for 584 instances (5:1). This is the terminal's check-in hall — walls, doors, windows, furniture, railings, stairs. These are mostly unique placements, not repeating patterns.

BOM Hierarchy Summary

TE Airport Terminal (BUILDING) — 7 FLOOR children, origin (84.6, -51.2, -30.7)
  ├── TE Foundation  (FLOOR, FN)  — 6 discipline SETs,   703 instances
  ├── TE Ground Floor (FLOOR, GF) — 8 discipline SETs, 3,513 instances
  ├── TE Level 1     (FLOOR, L1)  — 8 discipline SETs, 2,070 instances
  ├── TE Level 2     (FLOOR, L2)  — 7 discipline SETs, 2,609 instances
  ├── TE Level 3     (FLOOR, L3)  — 7 discipline SETs, 1,798 instances
  ├── TE Level 4     (FLOOR, L4)  — 7 discipline SETs, 2,307 instances
  └── TE Roof        (FLOOR, RF)  — 7 discipline SETs, 35,428 instances
                                                        ------
                                                        48,428 total

Floor origins are zeroed (R16 fix) — offsets stored on BUILDING→FLOOR TACK lines as dx/dy/dz. BUILDING origin holds the world LBD anchor.

Compiled Output (sjtii_terminal.db)

Table Predicted Actual Notes
elements_meta 48,428 48,428 G1-COUNT PASS (Spec 2 fix — StoreyCompiler skip for CO)
Delta (enbloc vs walkthru) 0 0 Compilation is consistent

Product catalog: 505 unique products → 48,428 placed instances (95.9× reuse).

Implementation Phases — All DONE

Phase What Status
TE-1 Z-centroid band assignment, 7 storeys normalised DONE
TE-2 ExtractionPopulator: 51,088→48,428 active, REBAR deactivated DONE
TE-3 BUILDING→FLOOR→DISCIPLINE→LEAF for CO mode DONE
TE-4 M_Product_Category=CO from YAML, commercial dispatch DONE
TE-5 CO_TE in GATE_SCOPE, surefire property forwarding DONE
TE-5B Output DB produced, 216 IfcSlab gap diagnosed + fixed DONE
TE-6/7 Verb factorization: TILE/ROUTE/FRAME/CLUSTER (1,131 lines, 42.8:1) DONE

Steps to Arrive at Compiled Output (guide for future IFC conversions)

The TE pipeline demonstrates the generalised IFC→BOM→compiled-output chain. Each step is reusable for any new building — only the YAML changes.

Step 1: EXTRACT — Python IfcOpenShell → component_library.db
   ├── extract.py reads IFC, writes I_Element_Extraction + I_Geometry_Map
   ├── Per-element: AABB (min/max XYZ), ifc_class, orientation, material
   ├── Per-product: geometry mesh (vertices + faces) in component_geometries
   └── Output: component_library.db tables populated

Step 2: CLASSIFY — YAML declares building identity + discipline mapping
   ├── classify_te.yaml: prefix, building_type, M_Product_Category
   ├── disciplines: map ifc_class → discipline code (ARC, STR, FP, ...)
   ├── storey_bands: Z-centroid ranges → storey names
   └── Output: YAML file (only human invention in the chain)

Step 3: POPULATE — Java ExtractionPopulator enriches extraction
   ├── Reads component_library.db → I_Element_Extraction
   ├── Z-centroid storey normalisation (NULL storey → band assignment)
   ├── REBAR deactivation (is_active=0 for IfcReinforcingBar)
   ├── M_Product_ID linkage: element_ref → product catalog
   └── Output: component_library.db enriched (deterministic, no invention)

Step 4: BUILD BOM — Java DisciplineBomBuilder creates BOM hierarchy
   ├── Reads extraction by storey + discipline
   ├── Creates: root BOM → floor BOMs → leaf-group BOMs
   ├── Each LEAF line: child_product_id, dx/dy/dz (parent-relative), element_ref
   ├── BomValidator: 9 checks + 2 pre-flights (abort on any failure)
   └── Output: {PREFIX}_BOM.db (m_bom + m_bom_line + M_Product)

Step 5: PREPARE COMPILE DB — Shell prepares per-building temp DB
   ├── cp {PREFIX}_BOM.db → _XX_compile.db
   ├── Apply schema_snapshot_bom.sql (adds tables: C_DocType, c_order, etc.)
   ├── Inject C_DocType row (OutputDbPath, ExpectedElements)
   ├── Load DSL content from YAML-referenced .bim file
   └── Output: library/_XX_compile.db (temp, auto-cleaned)

Step 6: COMPILE — Java CompilationPipeline reads compile DB, writes output
   ├── BuildingRegistryTest drives compilation via Maven surefire
   ├── BOMWalker traverses hierarchy, PlacementCollectorVisitor collects positions
   ├── Tack convention (§4): each level's origin + line dx/dy/dz → world coords
   ├── BuildingWriter emits elements_meta + elements_rtree + geometries
   └── Output: DAGCompiler/lib/output/{building_type}.db

Step 7: VERIFY — Shell runs delta + Rosetta Stone gates
   ├── enbloc vs walkthru element count delta (must be 0)
   ├── Per-class breakdown, AABB centroid delta, geometry divergence
   ├── Rule 8 (world-absolute check), clash check
   └── Output: PASS/FAIL verdict log

Refactoring guide: To add a higher abstraction layer, the natural boundary is between Step 4 (BOM) and Step 6 (compile). The BOM is the contract interface — upstream changes (extraction, classification) only affect BOM content, downstream changes (compilation, verification) only read the BOM. A new verb (TILE, ROUTE) changes how Step 6 interprets BOM lines, but the BOM structure (m_bom + m_bom_line) stays the same.

What SH/DX Taught Us (Foundation Advantage)

  1. Placement determinism works: extract coords → compile. Terminal already has 100% positional match from Phase DE-4.
  2. BOM pattern works: m_bom hierarchy + m_bom_line with child_product_id. Extending to 9 disciplines is data, not code.
  3. M_Product catalog is extensible: Terminal needs ~200 more products. Same table, same pattern.
  4. Discipline dispatch works: ElementPersistence emits all disciplines. Terminal's 9 disciplines already compile correctly.
  5. IFCtoBOM pipeline is abstract: classify_te.yaml follows the same YAML-driven pattern as classify_sh.yaml and classify_dx.yaml.
  6. G5-PROVENANCE is abstract: 7 checks run per building via DynamicTest. No Terminal-specific test code needed.

The challenge is scale and variety, not architecture.

New Verbs & BOM Mechanisms Needed

Terminal introduces patterns that SH/DX didn't need. Each pattern maps to a BIM COBOL verb and a YAML section that carries user intent.

Verb: TILE SURFACE (roof deck — 33,324 elements, 65%)

Mechanism: 2D grid expansion. One BOM line with qty=N expands to N placements at computed grid positions (origin + i×stepX + j×stepY).

YAML intent:

roof_deck:
  panels:
    - name: DECK_Z19_WEST
      product: PLATE_500x150x106
      origin_m: [92.49, -42.16, 19.0]
      grid: [15, 294]          # columns × rows
      step_mm: [495, 150]      # X-step, Y-step
    - name: DECK_Z19_CENTRAL
      product: PLATE_500x150x106
      origin_m: [122.63, -42.16, 19.0]
      grid: [14, 174]
      step_mm: [495, 150]

BOM mechanism: m_bom_line.qty = grid[0] * grid[1]. Walker expands qty to instances, each getting position from grid formula. No 33K rows in BOM.

Verb: ROUTE (MEP piping — 9,345 elements, 18%)

Mechanism: 1D path following. Pipe segments + fittings along a routed path. Each run = origin, direction, segment lengths, fitting types at turns.

YAML intent:

mep_systems:
  fire_protection:
    storey: GF
    runs:
      - name: FP_MAIN_GF_01
        segments: [PipeSegment_50mm, PipeFitting_Elbow_50mm, ...]
        path_nodes_m: [[10.0, 5.0, 3.2], [10.0, 15.0, 3.2], [20.0, 15.0, 3.2]]
    sprinklers:
      - name: SPRINKLER_SET_GF
        product: FireSuppressionTerminal
        spacing_mm: 3000
        ceiling_offset_mm: 50

Rebar — REMOVED from input (2,660 elements deleted)

Removed (2026-03-18): Rebar (IfcReinforcingBar) is already a fast Python addon script in Bonsai which adds rebar to any beam in STR easily, and need not be part of any main construction BOM. 2,660 elements deleted from Terminal_Extracted.db and component_library.db. Total TE elements: 48,432 → 48,428 (library).

IfcSensor — REMOVED from reference (4 elements deleted)

Removed (2026-03-18): IfcSensor (4 metadata-only elements, no spatial coords) is a Federation addon that generates onto finished construction — like rebar, it does not need compilation. Removed from SJTII_Terminal_extracted.db to enable G3-DIGEST verification. Total ref elements: 48,432 → 48,428 (matches output exactly).

Verb: WIRE LIGHTING (electrical — 814 elements)

Mechanism: 2D ceiling grid. Lights at regular spacing on a ceiling plane.

YAML intent:

electrical:
  storey: GF
  lighting:
    - name: LIGHTING_GF_MAIN
      product: LightFixture_600x600
      zone_m: [0, 0, 50, 30]   # minX, minY, maxX, maxY
      spacing_mm: [3000, 3000]
      height_m: 3.5

Verb: FRAME (structural grid — 590 elements)

Mechanism: Structural bay grid. Columns at grid intersections, beams spanning.

YAML intent:

structural:
  storey: GF
  frame:
    - name: FRAME_GF
      column: Column_W250
      beam: Beam_W310x60
      grid_m:
        x: [0, 6, 12, 18, 24, 30]
        y: [0, 8, 16]
      height_m: 4.0

BOM Mechanism: qty Expansion

The key new mechanism is m_bom_line.qty > 1. SH/DX have qty=1 (one line, one element). Terminal needs qty=N (one line, N elements at computed positions).

// BOMWalker expansion
for (MBOMLine line : children) {
    int qty = line.getQty();  // 1 for SH/DX, N for TE
    for (int i = 0; i < qty; i++) {
        visitor.visitLeaf(line, i);  // instance index
    }
}

Position computation per instance depends on the verb: - TILE: origin + (i % cols) * stepX + (i / cols) * stepY - ARRAY: origin + i * spacing * direction - ROUTE: segment-by-segment path accumulation - FRAME: grid intersection lookup

BOM Mechanism: Discipline Layer

Terminal adds Level 2 = DISCIPLINE between FLOOR and ASSEMBLY:

BUILDING → STOREY → DISCIPLINE → ASSEMBLY → LEAF

This requires bom_category on m_bom to carry discipline identity (ARC, STR, FP, ACMV, CW, ELEC, SP, LPG, REB). The walker doesn't need discipline-specific code — it's just another tree level. The YAML disciplines: section maps IFC classes to discipline categories:

disciplines:
  ARC:
    classes: [IfcWall, IfcSlab, IfcDoor, IfcWindow, IfcFurniture, IfcRoof,
              IfcPlate, IfcCovering, IfcRailing, IfcStairFlight]
  STR:
    classes: [IfcColumn, IfcBeam, IfcMember]
  REB:  # DEFERRED — IfcOpenShell Python generates dynamically
    classes: [IfcReinforcingBar]
  FP:
    classes: [IfcFireSuppressionTerminal, IfcAlarm, IfcSensor]
    system_type: [FireProtection]
  ACMV:
    classes: [IfcAirTerminal]
    system_type: [HVAC, AirConditioning]
  ELEC:
    classes: [IfcLightFixture, IfcElectricAppliance]
  CW:
    system_type: [ColdWater, DomesticWater]
  SP:
    system_type: [SanitaryPlumbing, Drainage]
  LPG:
    system_type: [Gas, LPG]

YAML as User Intent

The classify_te.yaml carries all user intent for Terminal — the same pattern as classify_sh.yaml and classify_dx.yaml. The Java pipeline reads YAML, never hardcodes building-specific logic. Adding a new Rosetta Stone = writing a new YAML, not new Java code.

Verb Roadmap — What Terminal Still Needs

Current state (session 30): All MEP elements absorbed by CLUSTER (approximate, avg 3.7m error). Exact verbs (TILE/ROUTE/FRAME) handle only 108 instances. The roadmap below tracks promotion from CLUSTER → exact verb per discipline.

Verb Status Discipline Predicted Actual (CLUSTER) AD Table Fidelity
TILE SURFACE EXACT ARC (roof) 33,324 12 exact, rest CLUSTER PASS (0.0m)
ROUTE EXACT FP/CW/SP/LPG 9,345+2,619 18 exact, rest CLUSTER ad_fp_coverage 0.32m max
FRAME EXACT STR 590 78 exact, rest CLUSTER 1.4mm max (S51 fix)
CLUSTER APPROX all MEP 47,607 29.1m max, 3.7m avg
ENCLOSE DESIGNED ARC (walls) ~1,038 not started
DISTRIBUTE DESIGNED ARC (furniture) ~2,123 not started
~~ARRAY~~ ~~REMOVED~~ ~~REB~~ ~~2,660~~ REB excluded

Gap: CLUSTER's 3.7m avg error is the G2-VOLUME 13.71% drift root cause. Promotion path: analyse each CLUSTER group for step-uniformity, reclassify groups with ≤1mm step variance as TILE/ROUTE/FRAME. Non-uniform residue stays CLUSTER.

Three-Layer Validation Resolution (S100-p84)

The 1,163 unfactored elements are not a pattern-mining problem — they're a standards application problem. The iDempiere three-layer validation resolves most of them without manual pattern recognition:

Layer 1: DocEvent per Org — blanket discipline rules. When AD_Org=FP, the org-scoped ModelValidator fires top-down during BOM walk. General placement rules (spacing, connectivity, host). Shared recipes in ERP.db (FP_SYSTEM, ACMV_SYSTEM, etc.) provide the abstract BOM templates.

Layer 2: ASI (AttributeSet Instance) — per-product/per-instance attributes. K-factor, pipe length, duct size. Same as customer options in manufacturing — modifies placement without changing the recipe.

Layer 3: AD_Val_Rule — same DocEvent engine, narrower scope. User adds a specific rule for a particular exploded C_OrderLine. Not a separate mechanism — a different granularity. Government standards (NFPA 13, UBBL, MS1183) are general rules (Layer 1). Layer 3 is for user-specific overrides.

This mirrors iDempiere document processing: ModelValidator (Org-scoped) → line item resolution (ASI) → validation rules (AD_Val_Rule).

Resolution estimate (1,163 unfactored elements):

Category Count Resolution Layer
SP/CW/LPG pipes ~450 Routing standards (branch length, riser sizing) DocEvent
FP devices (alarms, extinguishers) ~30 NFPA/UBBL spacing rules DocEvent
ELEC (switches, receptacles) ~20 Receptacle count per area DocEvent
Doors/windows ~30 Fire door placement per UBBL egress DocEvent
ACMV fittings ~25 Duct routing standards DocEvent
STR columns (irregular grid) ~63 ~50% by rules, rest human/AI pattern DocEvent + manual
Stairs ~178 Stair rules (see below) DocEvent + ASI
Walls ~41 Define space, not fill it — stays unfactored Unfactored
Furniture/fixtures ~50 Architect's choice — stays unfactored Unfactored
Remaining misc ~276 Mixed Mixed
Total resolvable ~550 (47%)

Stair Validation Rules — Already Partially Implemented

Infrastructure exists: ad_stair_requirement (7 rows, UBBL/IBC/NFPA), VerticalCirculationAD.java (StairRequirement record), VerticalCirculationValidator.java (count, width, travel distance), StairwellCheck.java (geometry-based UBBL check).

The 178 unfactored stair components (runs, landings, stringers) across GF-L4 are inherently variable in geometry, but their dimensions are rule-governed:

Rule Value Standard In ad_stair_requirement?
Riser height 100-175mm (public), 100-190mm (residential) UBBL By-Law 172 YES
Tread depth 250-300mm (public), min 225mm (residential) UBBL By-Law 172 YES
2R+G comfort 550-700mm (ideal 630mm) Blondel formula NO — add
Stairway width min 1050mm (public), 1200mm (high-rise >18m) UBBL By-Law 171 YES
Headroom min 2000mm UBBL practice / BS 5395 NO — add
Landing length min = stair width UBBL general YES
Max flight rise 3.0m before landing UBBL By-Law 168 NO — add
Riser uniformity max 9.5mm variance between risers IBC s1011.5.4 NO — add
Handrail height 900mm (UBBL), 864-965mm (IBC) UBBL / IBC s1014.2 YES
Guard height min 1070mm (42") IBC s1015.3 NO — add
Fire rating 1.0hr (<18m), 2.0hr (>18m) UBBL By-Law 166(3) YES

TE is >18m (59.8m tall) → requires 2.0hr fire-rated stairs, min 1200mm width, pressurization (50-100 Pa per UBBL By-Law 178), min 2 stairs.

These rules constrain stair geometry enough that ASI (per-instance run length, landing width) handles the remaining variance. The 178 stair components aren't "irregular" — they follow dimensional rules with per-instance variants. EYES geometric proofs (P04 Z-band, P01 positive extent) can verify the result.

ROUTE DUCTS and ROUTE PIPES are variants of ROUTE SPRINKLERS — same path-following walker, different M_Product leaves and AD regulation tables. Implementation cost: parameter mapping + AD table creation, not new verb code.

FRAME is structural bay grid placement. Columns at grid intersections (BIM_Component, identical), beams spanning between columns (BIM_Slab, IsInstance=1 if spans vary). Reads structural grid from YAML.

S51 FRAME LBD fix: Detection now clusters minX/minY (LBD positions) directly instead of centroids. The old approach computed LBD offsets as centroid - halfW[0], using element[0]'s half-width. Same-product elements with different actual dimensions (e.g., beams spanning 10m vs 8m bays) had up to 1.08m error. The fix eliminates the centroid→LBD conversion entirely — LBD positions ARE the grid positions. Embedded halfW,halfD in the verb formula (FRAME:x1,...|y1,...|halfW,halfD) preserves detection-time geometry metadata. Fidelity improved from 1.08m to 1.4mm. FRAME promoted back to EXACT_VERBS (gating at ≤5mm).

Why this matters for future buildings: Every commercial/institutional building has a structural grid. Warehouses (20m bays), stadiums (40m spans), high-rises (mixed column sizes per floor) — all use FRAME. The LBD clustering approach scales to any grid irregularity because it never converts between coordinate systems. This also establishes the pattern for GPU instancing: FRAME elements at grid intersections are natural candidates for hardware instanced rendering, since they share the same product geometry placed at known grid positions.

S51b: Validation Rules ARE the Patterns — ClusterPatternAnalyser

ClusterPatternAnalyser confirms that mined validation rules (M1-M17) describe the actual spatial patterns in CLUSTER groups. The data:

Product Type Groups Verdict Rule Match
Sprinkler heads (pendent) 5 storeys ZONE (rule-governed) M1 NN spacing 3.0-4.5m
Sprinkler heads (upright) 5 storeys ZONE (rule-governed) M1 NN spacing
Light fixtures (LED T8) 5 storeys ZONE (rule-governed) M4 grid ~3964mm
RC Beams (300×750, 500×700) 4 storeys ZONE + FRAME M6/M7 bay span
RC Columns 3 storeys ZONE M14 vertical continuity
Waiting room seats GF ZONE Furniture distribution

Key finding: Pipes/fittings (Poly Steel, UPVC) are MIXED — multi-Z, irregular positions, 100-200+ ASI size variants. These are MEP routing networks, not grid patterns. Their "pattern" is the routing rule (M2 branch max length, M3 riser diameter), not a spatial formula. The validation rule IS the placement constraint: "max 12m branch, min 50mm main riser, 150mm clearance from electrical."

Implication for EN-BLOC: Sprinklers, lights, beams, columns form ZONE patterns describable by validation rules. Pipes don't — they're routing networks governed by compliance rules, not spatial grids. EN-BLOC for pipes stays as CLUSTER (lossless replay). EN-BLOC for grid elements can be promoted to TILE/FRAME with ASI.

ASI taxonomy (BBC.md §3.5.1): Extraction seeds M_AttributeSet tables — confirms which product attributes are instance-varying (pipe length, beam span) vs fixed (pipe diameter, beam section). Per-instance values are designer decisions (generative path), not extraction data. The taxonomy is the reusable asset.

Tools: ClusterReclassifier (promotion analysis), ClusterPatternAnalyser (rule confirmation). Run: java ClusterPatternAnalyser library/TE_BOM.db.

ENCLOSE is wall perimeter placement. Follows a 2D closed path, inserts wall segments (BIM_Wall, IsInstance=1 — length varies) and openings at specified positions. Needed for ARC walls (~330) + openings (~236 windows, ~176 doors).

DISTRIBUTE is irregular zone placement for elements that don't follow formula patterns — furniture, equipment, proxies (~2,123 elements, 4.2%). These get flat per-element BOM lines (qty=1 each).

Discipline Model — See DISC_VALIDATION_DB_SRS.md §10.4.1

Discipline is a line attribute, not a tree level. The per-discipline spatial model (covering vs inside, verb profiles, validation rules, GoF patterns, BOM tree impact) is in DISC_VALIDATION_DB_SRS.md §6. That spec governs all buildings, not just TE. TE is the ground truth.

ERP Model Architecture — Terminal as Third Stone

Interactive ERD: docs/terminal_erd.html — 5-tab visualization with entity relationships, BOM hierarchy, verb→ERP mapping, M_Product_Category scoping, and ROUTE-as-BOM tree with M_AttributeSetInstance.

Terminal is the first building to stress the full iDempiere ERP model. SH/DX used BIM_Component (IsInstanceAttribute=0 — every element identical). Terminal forces M_AttributeSet/Instance into active service and reveals the natural correspondence between BIM construction hierarchy and ERP document flow.

Spatial MRP (see docs/ConstructionAsERPII.txt): Traditional MRP answers "what materials are needed and when?" The BIM Compiler answers "what materials are needed, where, and how they connect." A building is an assembled-to- order product — the YAML is the customer order, the classify file is the product configuration, and the compiler runs the production order. We're not inventing a new paradigm — we're adding a spatial dimension to iDempiere's battle-tested manufacturing model.

Future: M_Connection — element-to-element connection tracking (pipe segment to fitting, beam to column) with port semantics and verification status. Natural extension of ROUTE-as-BOM-tree. Candidate for G8 gate (connection audit).

M_Product_Category — Hierarchy Shape by Top-Level Category

Aligns to MANIFESTO.md §The Category Cascade. Classification lives on M_Product_Category at every cascade level (see DATA_MODEL.md §7). DocBaseType was removed (S84, W012). DocSubType retained for iDempiere C_DocType compatibility.

The top-level M_Product_Category determines the hierarchy shape:

M_Product_Category Hierarchy L2 Axis Compilation Path
RE (Residential) BUILDING → FLOOR → ROOM → SET → LEAF Room type (LI, KT, BD) EN-BLOC (singularity)
CO (Commercial) BUILDING → FLOOR → DISCIPLINE → ASSEMBLY → LEAF Discipline (ARC, FP, STR) WALK THRU (discipline-driven)

The RE path expects floor_rooms in YAML (Living, Kitchen, Bedroom) and walks rooms to find furniture sets. The CO path expects disciplines and never looks for rooms. Forcing Terminal through the RE path would require fake "rooms" for discipline zones — that's technical debt avoided.

The building prefix (SH/DX/TE) carries identity for BOM selection. When a second commercial building arrives (mall, factory), it will be M_Product_Category=CO with a different prefix. The hierarchy shape stays FLOOR→DISCIPLINE→ASSEMBLY.

M_Product_Category — Cascade Levels

M_Product_Category forms a cascade where each level's category defines the swap pool at that level. Room categories appear under RE buildings, discipline categories under CO buildings, and shared categories (storeys, structural) appear under both:

Category Type Codes BOM Level Scope
Storey GF, L1, L2, L3, L4, RF, FN Level 1 (FLOOR) Shared (RE + CO)
Room LI, KT, BD, BT, DN, FR Level 2 (RE only) RE buildings
Discipline ARC, STR, FP, ACMV, ELEC, CW, SP, LPG Level 2 (CO only) CO buildings
Assembly (verb-specific groupings) Level 3 Shared

Room and discipline codes operate at different BOM levels and never compete. Storeys are shared across RE and CO — always at Level 1. The Level 2 axis changes from room-type to discipline-type based on the top-level M_Product_Category. No new tables needed; M_Product_Category holds both sets, scoped by cascade level.

M_AttributeSet/Instance — Per-Verb Usage

SH/DX: zero elements needed instance attributes. Terminal changes that:

Verb AttributeSet IsInstance Reason
TILE SURFACE BIM_Component 0 All 33K roof plates identical — position varies, not dimensions
ROUTE BIM_Pipe / BIM_Conduit 1 Each pipe segment has different length
WIRE LIGHTING BIM_Component 0 All fixtures identical
FRAME (columns) BIM_Component 0 All columns identical per grid
FRAME (beams) BIM_Slab 1 Beam spans may vary by bay

M_AttributeSetInstance is needed for ROUTE-family verbs (~9,345 FP/CW/SP/LPG pipe elements with varying lengths). TILE/ARRAY/WIRE produce identical instances — the formula handles position, not the attribute set.

TILE — Pattern as Verb Parameter, Not AttributeSet

TILE is BOMQty — the M_Product leaf spreads over an AABB with its orientation. The pattern (grid formula) lives on W_Verb_NodeProduct, not M_AttributeSet:

C_OrderLine (WHAT):   M_Product = ROOF_DECK_PANEL_SET, qty = 4,410
W_Verb_Node (HOW):  Verb = TILE SURFACE
  W_Verb_NodeProduct: origin, grid_cols=15, grid_rows=294, step_x=495, step_y=150
M_BOM_Line dx/dy/dz (WHERE): AABB = 7,425 × 44,100 mm (the filled envelope)

Changing the grid (16×294 instead of 15×294) changes only W_Verb_NodeProduct. The same PLATE_500x150x106 product appears in different TILE patterns across different roof bays. Clean separation: verb owns the formula, BOM owns the qty.

ROUTE — Segments as BOM Tree + M_AttributeSetInstance

A ROUTE is not a flat list — it's a BOM tree. Each segment is a BOM line with instance attributes (varying length). Fittings are fixed-geometry components. Branches are sub-BOMs:

FP_MAIN_GF_01 (BOM, bom_category: FP)
├── SEGMENT_01 (M_Product: PIPE_CW_50MM)
│   └── M_AttributeSetInstance: {length_mm: 3200}    ← BIM_Pipe, IsInstance=1
├── FITTING_01 (M_Product: ELBOW_90_50MM)
│   └── (no instance — BIM_Component, fixed geometry)
├── SEGMENT_02 (M_Product: PIPE_CW_50MM)
│   └── M_AttributeSetInstance: {length_mm: 4800}
├── TEE_01 (M_Product: TEE_50x25MM)
│   └── branches to:
│       └── BRANCH_RUN_01 (sub-BOM)
│           ├── SEGMENT_B1 (PIPE_CW_25MM, length=1200mm)
│           ├── SPRINKLER_01 (SPRINKLER_UPRIGHT_K80)
│           ├── SEGMENT_B2 (PIPE_CW_25MM, length=4600mm)
│           └── SPRINKLER_02 (SPRINKLER_UPRIGHT_K80)
└── SEGMENT_03 (M_Product: PIPE_CW_50MM)
    └── M_AttributeSetInstance: {length_mm: 2100}

This mirrors iDempiere's configurable product model: a shirt has size/color as M_AttributeSet variants. A pipe segment has length as M_AttributeSet variant. The BOM tree says "this run needs: 3 segments + 1 elbow + 1 tee + 1 branch." The instances say "segment 1 is 3200mm, segment 2 is 4800mm."

The leaf M_Product set is small: pipe sizes (25mm, 50mm, 75mm), elbows, tees, reducers, sprinkler heads, valves. The ROUTE verb assembles them into run-specific BOM trees with per-segment instance attributes.

Val_Rule — Regulations as Domain AD Tables

ROUTE verbs must obey building regulations (UBBL, NFPA 13, MS 1910). The question: how to capture these constraints? iDempiere's AD_Val_Rule uses SQL WHERE fragments. BIM needs domain-specific AD tables instead — they're queryable, YAML-declarable, and compose with verb compliance checking.

Regulation AD Table Example Constraint
Sprinkler spacing ad_fp_coverage max_spacing_mm <= 4600 WHERE hazard='ORDINARY'
Pipe sizing for flow ad_fp_coverage diameter_mm >= 50 WHERE flow_lpm > 200
Max branch length ad_fp_coverage branch_length_mm <= 12000
Receptacle count/area ad_space_type_mep receptacle_count >= area_sqm / 10
Duct sizing per ACH ad_acmv_sizing duct_area_mm2 >= cfm / velocity
Routing method ad_fp_coverage routing_method IN ('TREE','LOOP','GRID')

Each verb reads its AD regulation table to determine sizing, spacing, and method. The verb output (BOM tree) is provably compliant. The Rosetta Stone gate can verify compliance as a future G7 check (regulation audit).

Routing method is a strategy selection on the AD table: - TREE — main → branches → heads (most common) - LOOP — ring main with branches (redundancy) - GRID — parallel mains with cross-connections (large areas)

Same leaf products (pipes, fittings, heads), different BOM tree structure. The method column on ad_fp_coverage determines which ROUTE variant runs.

YAML intent for regulations:

fire_protection:
  hazard_class: ORDINARY
  coverage_area_sqm: 12.1        # UBBL Table 5.1
  max_spacing_mm: 4600
  min_pipe_diameter_mm: 25
  routing_method: TREE

C_Order/C_OrderLine — Three-Way Separation

The Terminal C_Order in iDempiere terms:

C_Order (header):
  C_DocType_ID: CO_TE
  Description: SJTII Airport Terminal

C_OrderLine (tab — one per storey-discipline BOM):
  Line 10: FLOOR_TE_FDN     qty=1     ← Foundation
  Line 20: FLOOR_TE_GF      qty=1     ← Ground Floor
    Line 20.10: ARC_TE_GF   qty=1     ← Architecture
    Line 20.20: STR_TE_GF   qty=1     ← Structure
    Line 20.30: FP_TE_GF    qty=1     ← Fire Protection
      → W_Verb_Node: ROUTE SPRINKLERS "FP_MAIN_GF_01"
        path_nodes, pipe_product, branch_spacing...
    Line 20.40: ACMV_TE_GF  qty=1
    Line 20.50: ELEC_TE_GF  qty=1
    Line 20.60: CW_TE_GF    qty=1
    Line 20.70: SP_TE_GF    qty=1
    Line 20.80: LPG_TE_GF   qty=1
  Line 70: FLOOR_TE_RF      qty=1     ← Roof
    → W_Verb_Node: TILE SURFACE (grid formula per bay)

The three-way separation governs the entire architecture:

Concern ERP Table What It Carries
WHAT to build C_OrderLine Which M_Product/M_BOM, qty
WHERE it goes M_BOM_Line dx/dy/dz Spatial relationships (tack offsets)
HOW to build W_Verb_Node Verb parameters (grid, path, method)

The 7-storey × 8-discipline grid produces ~40-50 C_OrderLines — a normal iDempiere sales order size. The user sees storeys as order lines, disciplines as sub-lines, and verbs as manufacturing instructions. The YAML is the order form; the compiler generates the transactional records.

Full BOM Tree With ERP Mapping

L0: BUILDING_TE_STD (BUILDING, M_Product_Category=CO)
    C_Order = CO_TE
    ├─ L1: FLOOR_TE_GF (FLOOR, bom_category=GF)
    │  C_OrderLine #20
    │  ├─ L2: ARC_TE_GF (DISCIPLINE, bom_category=ARC)
    │  │  C_OrderLine #20.10
    │  │  └─ L3: [flat placement — walls, doors, windows, furniture]
    │  ├─ L2: STR_TE_GF (DISCIPLINE, bom_category=STR)
    │  │  C_OrderLine #20.20
    │  │  └─ L3: FRAME verb → columns at grid, beams spanning
    │  ├─ L2: FP_TE_GF (DISCIPLINE, bom_category=FP)
    │  │  C_OrderLine #20.30
    │  │  W_Verb_Node: ROUTE SPRINKLERS
    │  │  Val_Rule: ad_fp_coverage (spacing, sizing, method)
    │  │  └─ L3: BOM tree of runs/branches/heads
    │  │     M_AttributeSetInstance per segment (varying lengths)
    │  ├─ L2: ACMV_TE_GF (DISCIPLINE, bom_category=ACMV)
    │  │  W_Verb_Node: ROUTE DUCTS
    │  │  Val_Rule: ad_acmv_sizing (ACH, duct sizing)
    │  │  └─ L3: duct runs + air terminals
    │  ├─ L2: ELEC_TE_GF (DISCIPLINE, bom_category=ELEC)
    │  │  W_Verb_Node: WIRE LIGHTING
    │  │  Val_Rule: ad_space_type_mep (receptacle count)
    │  │  └─ L3: ceiling grid + circuits
    │  └─ L2: CW/SP/LPG_TE_GF
    │     W_Verb_Node: ROUTE (per system)
    │     └─ L3: pipe runs + terminals
    ├─ L1: FLOOR_TE_L01 ... FLOOR_TE_L04
    │  (same discipline structure per storey)
    └─ L1: FLOOR_TE_RF (FLOOR, bom_category=RF)
       C_OrderLine #70
       ├─ L2: ARC_TE_RF (DISCIPLINE, bom_category=ARC)
       │  W_Verb_Node: TILE SURFACE (per bay)
       │  └─ L3: 33K panels from ~20 TILE formulas
       │     BOMQty = grid_cols × grid_rows per formula
       └─ L2: [other disciplines at roof level]

Current State (2026-03-28, S100-p84 audit)

  • BOM walk compiler LIVE (S100-p72). All buildings compile via single BOM walk path.
  • Gate: 6/7 PASS, 1 WARN (C9). G0-COMPILED PASS. G1-G6 PASS. C8 PASS. C9 WARN (60 axis swaps).
  • Output: DAGCompiler/lib/output/sjtii_terminal.db — 48,428 elements, 251MB.
  • No cheating detected (S100-p84 forensic audit). Single write path, no extraction DB access, no TE-conditional logic in compilation, tamper seal INTACT (73 files).

Rosetta Stone (2026-03-28 08:50): IFCtoBOM QA all PASS. BOM walk 339ms. Write 8.3s. Total pipeline 13s.

Audit Findings — S100-p84 Forensic

What the pipeline log tells us about BOM correction targets:

Area Finding Fix Direction
C9 axis swap (60 walls) CLUSTER groups mix wall orientations. Rank-matcher assigns W↔D incorrectly when walls face different directions within the same group. Split CLUSTER groups by orientation during IFCtoBOM verb detection.
Unfactored elements (1,163) 342 UPVC pipes, 57 rectangular columns, 44 HDPE pipes, 178 stair components, misc fixtures. IFCtoBOM couldn't find regular spatial patterns. Human/AI-assisted pattern recognition → recognised_patterns in TerminalAnalysis → IFCtoBOM crafts by hand. Deterministic, reproducible.
P04 Z-band (87% violations) Airport spans Z=-30.6m (foundation) to Z=+22.6m (roof). Default P04 band [-8.5, 10.5] too narrow. Per-building P04 calibration or derive from BOM storey Z ranges.
ProveStage 0ms Prover skips TE — "no proof aggregate." Zero P01-P28 mathematical proof coverage. Wire prover for CO buildings (currently only fires for RE).
H6 "No rooms found" TE has no room-level BOM structure. ValidationStage completeness check skipped. Expected for CO path. Room-level validation deferred until assembly sub-groupings added (Level 3).

Unfactored element breakdown (mining targets):

Product Count Floor(s) Opportunity
Pipe Types:jkrME_pipe_UPVC 342 All Biggest win — branching pipe runs, candidate for ROUTE
M_Rectangular Column:600x300mm 57 Multiple Irregular grid — may need human-identified pattern
Pipe Types:jkrME_pipe_HDPE 44 Multiple Drainage pipes — routing networks
Stair components (various) 178 GF-L4, RF Inherently irregular — likely stays unfactored
Walls (various) 41 L2, L3, GF Small count, low priority
Furniture/fixtures 22 GF, L3 One-off placements — stays unfactored

Discipline factorization quality (from IFCtoBOM log):

Floor Best factored Worst factored Notes
RF ARC: 33,386 instances from 6 patterns SP: 0 patterns, 11 unfactored Roof is 69% of building
GF FP: 1,128 from 17 patterns ARC: 99 unfactored (check-in hall) Most complex floor
L01 FP: 1,182 from 7 patterns SP: 113 unfactored pipes Sanitary plumbing needs ROUTE
L04 FP: 1,065 from 6 patterns SP: 4 from 1 pattern, 4 unfactored Well factored
FDN STR: 427 from 4 patterns SP: 128 unfactored pipes Underground MEP irregular

Infrastructure Corruption Precedent

The reference/infrastructure/ directory contains 9 IFC4X3_ADD2 files (roads, bridges, railways). When these were previously processed through the building-only extraction path, the pipeline corrupted because:

  1. get_storey_for_element() only recognizes IfcBuildingStorey — all infrastructure elements became storey="Unknown"
  2. UNIQUE constraint on (building_type, storey, ifc_class, ordinal) broke — all elements in one storey caused ordinal collisions
  3. Cascade: degenerate BOM → BomValidator FAIL → pipeline abort

Guard: Infrastructure IFCs use IfcFacilityPart (IfcRoadPart, IfcBridgePart, IfcRailwayPart) instead of IfcBuildingStorey. The extraction layer must FAIL early on IFC4X3 files with facility parts but no building storeys until support is implemented.

TE is safe: Terminal is IFC2x3 with standard IfcBuildingStorey. No facility parts. The corruption risk applies only to IFC4X3 infrastructure files, not to TE.

Full analysis: InfrastructureAnalysis.md.


Post-TE-4 BOM Model Analysis (2026-03-16)

BOM Hierarchy: BUILDING → FLOOR → DISCIPLINE → LEAF

BUILDING_TE_STD (73,670 x 59,124 x 59,818 mm)
  ├── TE_FDN  [Foundation]    703 active,  5 disciplines
  ├── TE_GF   [Ground Floor] 3,513 active, 8 disciplines
  ├── TE_L01  [Level 1]      2,070 active, 6 disciplines
  ├── TE_L02  [Level 2]      2,609 active, 8 disciplines
  ├── TE_L03  [Level 3]      1,798 active, 7 disciplines
  ├── TE_L04  [Level 4]      2,307 active, 7 disciplines
  └── TE_RF   [Roof]        35,428 active, 8 disciplines
                             ------
                             48,428 placement instances in 50 leaf-group BOMs
                             (unfactored — each instance is a separate m_bom_line row)

Envelope Protrusion — Awnings and Canopies

ARC discipline extends beyond the STR structural envelope:

Axis STR range (m) ARC range (m) ARC protrusion (m)
X (width) 64.06 73.67 +9.61
Y (depth) 42.10 56.12 +14.02

The ARC envelope (84.6–158.3m X, -48.2–7.9m Y) extends ~10m beyond STR (88.9–153.0m X, -41.2–0.9m Y) in both directions. This is the terminal's awning/canopy system — IfcPlate elements on the Roof storey (33,324 plates) that overhang the structural frame. The LPG discipline at -51.2m Y extends furthest south (underground gas piping below the apron).

The BUILDING BOM AABB (73.67 x 59.12 x 59.82m) encompasses ALL disciplines including protrusions. Each FLOOR AABB is computed from its own elements, so floor W/D may exceed the BOM containment rule — this is expected for awning/canopy overhangs.

BomCategory Structure

58 BOMs total: 1 root + 7 floor-level + 50 leaf-group BOMs

BomCategory Count Role
ARC 7 Architectural: plates, walls, doors, windows, furniture
STR 7 Structural: columns, beams, slabs
FP 7 Fire protection: sprinklers, alarms, pipe segments
CW 7 Cold water: pipe segments, fittings, valves
SP 7 Sewerage/plumbing: pipe segments, fittings
ACMV 6 Air conditioning: air terminals, ducts (no Foundation)
ELEC 6 Electrical: light fixtures, building element proxies (no Foundation)
LPG 3 Gas: pipe fittings, segments (Foundation + GF + L1 only)
FN/GF/L1-L4/RF 7 Storey-level containers

Not all disciplines appear on all storeys. LPG only reaches Level 1 (gas risers stop at low levels). ACMV and ELEC skip Foundation (no MEP below grade).

Tack I/O — Layer-to-Layer Offset Chain

Current implementation (centroid-floorMin — DRIFTED from spec):

BOMWalker tack accumulation (4 levels):

  BUILDING origin = (allMinX, allMinY, allMinZ)
  + FLOOR offset  = (floorMinX - allMinX, floorMinY - allMinY, floorMinZ - allMinZ)
  + DISCIPLINE    = (0, 0, 0)  ← logical grouping, no spatial offset
  + LEAF centroid  = (centroidX - floorMinX, centroidY - floorMinY, centroidZ - floorMinZ)
  ─────────────────
  = element centroid (world coordinates)   ← CORRECT positions, WRONG convention

Spec-compliant implementation (BOMBasedCompilation.md §4):

  BUILDING origin  = (allMinX, allMinY, allMinZ)         ← building LBD (world)
  + FLOOR (dx,dy,dz) = where floor's LBD sits in building   ← tack_from (3D, always >= 0)
  + DISCIPLINE       = (0, 0, 0)                             ← logical grouping, no spatial offset
  + LEAF (dx,dy,dz)  = where element's LBD sits in parent   ← tack_from (3D, always >= 0)
  + BUFFER fills  = parent AABB − SUM(children AABB)     ← completeness invariant
  ─────────────────
  = element LBD (world coordinates)       ← CORRECT positions, CORRECT convention
  centroid = element LBD + (width/2, depth/2, height/2)  ← output stage only

What changes: LEAF dx is the position where the element's LBD corner sits within the parent — no longer a centroid offset. BUFFER lines fill the gaps between children so parent AABB = SUM(children) (the validateBOM() invariant). World positions remain identical; centroid is recovered at output.

The DISCIPLINE layer is transparent to tacking — zero offset means the walker accumulates through it without error. This is the key design insight: discipline is a logical container (ERP grouping) not a spatial one.

EN-BLOC vs WALK THRU

  • EN-BLOC: reads all 48,428 placement rows with pre-computed dx/dy/dz. Each row already has parent-relative offsets. Takes each as-is when AABB and DocType (CO_TE) are consistent. ~25 min for 48K instances.

  • WALK THRU: re-derives positions by tacking through the 4-level hierarchy. Proves the BOM structure is self-consistent. Both paths must produce identical output. Currently slow at 48K elements — verb compression (TE-6/7) will reduce to ~2,500 BOM lines.

Dominant Element: Roof IfcPlate (33,324 = 69%)

The roof deck dominates: 33,324 IfcPlate elements under ARC/Roof. These are modular metal deck panels forming the terminal's characteristic undulating roof canopy. Analysis of the reference DB shows regular grid patterns (X-step ~495mm, Y-step ~150mm) across 9 Z-bands — ideal for TILE SURFACE verb compression to ~20 panel formulas.

Compression Roadmap

Phase Verb Elements → BOM Lines Ratio
TE-6 TILE SURFACE 33,324 roof plates ~20 1,666x
TE-7a ROUTE ~13K pipe/duct ~200 65x
TE-7b WIRE LIGHTING ~2K fixtures ~50 40x
TE-7c FRAME ~590 col/beam ~20 30x
flat ~2,123 irregular 2,123 1x
Total 48,428 ~2,500 19x

At the YAML/OrderLine layer: ~235 declarations → 48,428 placements = 206x.

CP-4 Geometric Archetype (S44)

The compiler must not branch on IFC class — 43 decision points were identified switching on IFC class strings, violating BBC.md §2.2.1 (class-agnostic compilation). TE's 33,324 IfcPlate elements are actually Metal Deck (107×150×500mm, planarity=0.21) — Compiler treated all as CURTAIN_PANEL based on IFC class label.

Three-layer solution: 1. Geometric archetype (PLANAR/ELONGATED/COMPACT/MIXED + scale band) from dimensions 2. Component library (component_definitions, M_Product, placement_rules) for semantic identity 3. IFC class — traceability metadata only, never a decision variable

Foundation delivered S44: GeometricFingerprint.java, P10_SHAPE_IDENTITY, GeometricFingerprintTest.java. Phases 4a–4e in ACTION_ROADMAP.md §CP-4.


Coding Specs — TE-5B: 216 IfcSlab Gap Fix (2026-03-17)

Problem Statement

TE compiles 48,212 output elements but BOM has 48,428 placement rows. The gap is exactly 216 IfcSlab (extraction: 705, output: 489). Every other IFC class matches exactly. Additionally, 5 IfcSlab are lost at extraction→BOM (705 active → 700 BOM rows).

Root Cause Chain (3 bugs, 1 design gap)

Bug 1: element_ref = product type name, not element GUID

I_Element_Extraction.element_ref stores the Revit Family:Type string (e.g. Floor:S_Slab_200_RC_Flat_V1), not a per-element GUID. The Python extractor puts {Family}:{Type} in this field. This means 700 IfcSlab BOM lines have only 30 distinct element_ref values. The largest group is jkrST_str-fo_pc_rcp: 300 x 300mm with 236 occurrences.

SH/DX happen to work because their element_ref values are more unique (fewer identical types). TE exposes the latent assumption that element_ref = unique ID.

Bug 2 (REVISED): StoreyCompiler consumes element_ref by product type

The real root cause is NOT GUID collision. StoreyCompiler generates structural slabs (Stage 3) and marks element_refs as consumed. Since element_ref is a product type name, PlacementLoader.markConsumed("Floor:S_Slab_200_RC_Flat_V1") consumes ALL 189 elements of that type. The extracted placement path (Stage 4) then skips all of them.

Output evidence: IfcSlab GUIDs are SLAB_GROUND FLOOR_UNIT_* (StoreyCompiler pattern), not STR_MD_SLAB_GROUND_FLOOR_* (extracted pattern). The 489 output slabs are StoreyCompiler-generated from computed bay dimensions, not BOM positions.

Design Gap (FIXED): deriveDiscipline() ignores extraction discipline

PlacementCollectorVisitor.deriveDiscipline() mapped IfcSlab → "STR" always. Fixed in TE-5C: disciplineStack now carries the authoritative discipline from the parent SET BOM's bom_category. resolveDiscipline() prefers stack over static mapping. Falls back to deriveDiscipline() for SH/DX.

Spec 1: Unique element_ref via placement_id

File: ExtractionPopulator.java (or Python extract.py)

The element_ref column in I_Element_Extraction must hold a value unique per element placement, not per product type. Options:

Option Value Uniqueness Breaking change
A (recommended) {storey}:{ifc_class}:{ordinal} Unique per extraction Low — ordinal already exists
B placement_id (autoincrement) Unique by definition Medium — changes downstream joins
C IFC GlobalId Unique per IFC spec High — requires Python extractor change

Recommendation: Option A. Compose element_ref as {storey}:{ifc_class}:{ordinal} at extraction time. This is deterministic (reproducible from same IFC file), unique per element, and requires no Python extractor changes (ordinal already computed). The DisciplineBomBuilder passes e.elementRef() through unchanged.

Guard: After implementing, assert COUNT(DISTINCT element_ref) = COUNT(*) on I_Element_Extraction WHERE is_active=1 in BomValidator.

Spec 2 (REVISED → SUPERSEDED by S100-p72): BOM walk compiler

S100-p72 replaced CompileStage entirely. The old shouldSkip() + emitGlobalPlacementElements() path is gone. All buildings now compile via BOMWalker + PlacementCollectorVisitor — single path, no skip logic. See DISC_VALIDATION_DB_SRS.md §10.4.1 (shouldSkip is an anti-pattern).

Result: G1-COUNT 48,428 = 48,428. IfcSlab 489 → 705. SH/DX zero regression.

Spec 3: Propagate extraction discipline through BOM to placement — DONE

File: PlacementCollectorVisitor.java

Implemented in TE-5C. disciplineStack pushes bom_category from SET-level BOMs in onSubAssembly, pops in onSubAssemblyComplete. resolveDiscipline() prefers stack over deriveDiscipline() static mapping.

Spec 4: Expected element count — active only

File: run_RosettaStones.sh:157DONE (2026-03-17)

Changed SELECT COUNT(*) to include AND is_active = 1. Verified SH/DX unaffected (no deactivated elements).

Spec 5: 5 missing IfcSlab at extraction→BOM

Diagnosis needed. 705 active IfcSlab in extraction, 700 in BOM. 5 elements lost somewhere in DisciplineBomBuilder. Likely cause: storey mismatch or product lookup failure. Add diagnostic logging to DisciplineBomBuilder when an extraction element doesn't produce a BOM line.

Implementation Order

  1. Spec 4 ✅ (done — is_active=1 in expected count)
  2. Spec 3 ✅ (done — discipline stack in PlacementCollectorVisitor)
  3. Spec 2 ✅ (done — CompileStage.shouldSkip() for CO mode, 216 gap closed)
  4. Spec 1 — unique element_ref (defensive, for future WYSIWYG gates)
  5. Spec 5 — diagnose 5 missing slabs at extraction→BOM (minor)

Verification

After Spec 2: rm TE_BOM.db && ./scripts/run_RosettaStones.sh classify_te.yaml - G1-COUNT: expected 48,428, actual must equal 48,428 - Delta: enbloc == walkthru (0 difference) - Output IfcSlab GUIDs should be STR_MD_SLAB_* / ARC_MD_SLAB_* (extracted) not SLAB_GROUND FLOOR_UNIT_* (StoreyCompiler)


Learning Points — TE-5 Pipeline Plumbing (2026-03-17)

L1: Surefire forks a new JVM — CLI -D properties don't pass through

Maven's surefire plugin forks a separate JVM to run tests. System properties passed on the Maven CLI (-Dbom.db=...) are Maven properties, NOT JVM system properties in the forked process. You must explicitly forward them:

<configuration>
    <systemPropertyVariables>
        <bom.db>${bom.db}</bom.db>
        <bom.mode>${bom.mode}</bom.mode>
        <doc.base.type>${doc.base.type}</doc.base.type>
    </systemPropertyVariables>
</configuration>

Symptom: System.getProperty("bom.db") returns null in tests, even though the shell script passes -Dbom.db=... on the Maven command line. Tests PASS (via assumeTrue skip), no output DB produced, zero visible error.

Trap: This is invisible in SH/DX when tests are excluded from GATE_SCOPE. The test silently skips, Maven exits 0, shell interprets as "compiled OK".

L2: GATE_SCOPE must be kept in sync across test classes

RosettaStoneGateTest.GATE_SCOPE and BuildingRegistryTest.GATE_SCOPE are independent Set<String> constants. Adding CO_TE to one doesn't add it to the other. Both must be updated when a new building enters the pipeline.

Trap: BuildingRegistryTest uses assumeTrue(GATE_SCOPE.contains(...)). When a docTypeId is missing from GATE_SCOPE, the test is silently skipped (not failed). Maven reports 0 failures. The shell script sees exit code 0 and says "compiled OK" — but no test actually ran.

L3: element_ref is NOT a unique element identifier in federated IFC

In federated models (Terminal = 9 discipline files merged), element_ref from the Python extractor is {Family}:{Type} (Revit nomenclature). This is a product type name, not a per-element GUID. Examples:

Metal Deck:Metal Deck           → 33,324 occurrences (all roof plates)
M_Concrete-Rectangular Beam:... → 126 occurrences (same beam type)
Floor:S_Slab_200_RC_Flat_V1     → 189 occurrences (same slab type)

SH/DX happened to work because their models have fewer identical-type elements, so element_ref was effectively unique. TE's scale (51K elements, 505 products) broke the latent assumption.

Rule: Never assume element_ref is unique. Use (building_type, storey, ifc_class, element_ref, ordinal) as the composite key, or synthesize a unique ID from these fields.

L4: Silent UNIQUE constraint catch hides data loss

ElementPersistence.writeElementMeta() catches UNIQUE constraint violations and returns false. This was correct for DX multi-unit merge (intentional deduplication of shared perimeter walls). But in TE, the same catch silently drops legitimate elements whose GUIDs happen to collide due to ordinal reuse.

Rule: The UNIQUE-catch pattern is safe only when the caller knows duplicates are expected. For CO-mode compilation, GUID construction must guarantee uniqueness BEFORE the INSERT, not rely on the DB to deduplicate.

L5: deriveDiscipline(ifcClass) is a lossy function

The static mapping IfcSlab → STR discards information that the extraction already knows. A slab in TE_GF_ARC is an architectural floor finish; a slab in TE_GF_STR is a structural slab. Both are IfcSlab but serve different roles. The BOM hierarchy preserves this context, but it's lost at the flat placement stage because deriveDiscipline only looks at the IFC class name.

Rule: Discipline is a property of the BOM context (which discipline SET the element belongs to), not a function of the IFC class alone. The walker must carry discipline through the hierarchy, like it carries storey.

L6: assumeTrue masks pipeline failures as green

JUnit 5 assumeTrue(condition) causes a test to be skipped, not failed. Surefire counts skipped tests as non-failures. Maven exits 0. Shell scripts that check $? see success. The entire pipeline can be silently non-functional with all-green verdicts.

Guard: When a test is skipped unexpectedly, the script should detect Tests run: 1, Failures: 0, Errors: 0, Skipped: 1 and treat Skipped > 0 as a warning, or require that at least 1 test actually passed.

L7: IfcSlab has two code paths — StoreyCompiler vs extracted placements

The compilation pipeline has two code paths for IfcSlab:

  1. StoreyCompiler path: Generates slab geometry from bay/floor dimensions. Produces GUIDs like SLAB_GROUND FLOOR_UNIT_1. This is the "compiled" path — the compiler invents slab geometry based on storey dimensions and structural grid, not from extracted element positions.

  2. Extracted placement path: emitExtractedElements() in BuildingWriter writes extracted elements with GUIDs like STR_MD_SLAB_FOUNDATION_10. Uses element positions from the BOM.

In SH/DX (RE mode), all elements go through the extracted path. In TE (CO mode), IfcSlab may be consumed by StoreyCompiler.applyPlacementOverrides() which marks element_refs as consumed via PlacementLoader.markConsumed(). Subsequent extracted placements with the same element_ref are skipped at line 959 of BuildingWriter: if (isConsumed(...)) continue;

Key insight: With non-unique element_ref (product type names), marking one slab element_ref as consumed (e.g. Floor:S_Slab_200_RC_Flat_V1) skips ALL 189 elements with that same type. This is why the gap is concentrated in IfcSlab — StoreyCompiler produces a few slabs per storey but consumes the element_ref for ALL slabs of that type.

Evidence: Output GUIDs for IfcSlab are SLAB_GROUND FLOOR* (StoreyCompiler), not STR_MD_SLAB_GROUND_FLOOR_* (extracted path). The 489 output slabs are StoreyCompiler-generated, not extracted placements.

Fix direction: Either: - Disable StoreyCompiler slab generation for CO mode (slabs come from BOM) - Or make isConsumed() match on (element_ref, ordinal) not just element_ref

L8: The compilation pipeline is a sequence of consumers

Understanding the pipeline's internal flow is critical for debugging:

CompilationPipeline.run()
  ├── Stage 1: TEMPLATE (ST mode only — skipped for RE/CO)
  ├── Stage 2: LOAD — PlacementLoader reads BOM, BOMWalker collects placements
  ├── Stage 3: STOREY — StoreyCompiler generates structural slabs, bay slabs
  │   └── Marks element_refs as "consumed" (applyPlacementOverrides)
  ├── Stage 4: WRITE — BuildingWriter emits elements to output DB
  │   ├── emitCompiledElements() — from StoreyCompiler (slabs, columns, beams)
  │   └── emitExtractedElements() — from BOM placements (skips consumed refs)
  ├── Stage 5: SURFACE — surface styles from component_library.db
  ├── Stage 6: PROVER — PlacementProver verifies spatial properties
  └── Stage 7: SHADOW — cross-check against reference DB

The STOREY stage runs BEFORE WRITE. It generates slab/bay elements from computed dimensions and marks element_refs as consumed. Then WRITE's extracted path skips consumed refs. This is correct for SH/DX where element_ref is unique — consuming Floor_GF_01 consumes exactly one slab. But for TE where element_ref is a product type name, consuming Floor:S_Slab_200_RC_Flat_V1 consumes ALL 189 slabs of that type.

Rule for new buildings: If a new building uses CO mode (discipline BOMs), check whether StoreyCompiler generates structural slabs. If so, either disable slab generation (BOM already provides slabs) or ensure element_ref uniqueness so isConsumed() doesn't over-consume.

Verb Fidelity — What TE Gates Actually Prove

TE passes all 7 gates (G1-G5, compile, delta). But the gates check aggregates, not per-verb correctness. This section documents what is and isn't proven.

What the gates prove

Gate What it verifies Coverage
G1-COUNT Total element count matches extraction Exact (48,428 = 48,428)
G2-VOLUME Sum of AABB volumes matches Exact (+0.00%)
G3-DIGEST SHA-256 of sorted coordinates PASS (4 IfcSensor removed — Federation addon)
G5-PROVENANCE Every geometry_hash exists in library All elements verified
QA (step 9) BOM structure, duplicates, orphans, AABB containment 15 checks, all PASS
Verb fidelity (step 9b) Round-trip LBD comparison FRAME/TILE/CLUSTER exact (gating); ROUTE advisory

Schema-Not-Geometry classification: ERP-maths. Verb factorization (TILE, ROUTE, SPRAY, FRAME) takes extracted AABB positions and compresses them into recipe formulas (grid step, path topology, cluster offsets). No IFC relationship exists for "these elements form a grid pattern" — the pattern recognition IS manufacturing recipe logic, same as M12 pipe clearance: arithmetic on positions is the correct method. Not a schema gap. See BIM_COBOL.md §20 for the spatial predicate verbs that standardise these queries.

What the gates don't prove (TE-specific gaps)

  1. Per-element position verification — DONE (2026-03-18). G3-DIGEST now PASS (4 IfcSensor removed — Federation addon). TotalityContractTest covers CO_TE.

  2. Rotation verification — DONE (2026-03-18). RotationContractTest covers CO_TE. TE doors/windows now have W/D alignment check.

  3. ROUTE step uniformity. VerbDetector accepts non-uniform element spacing as a ROUTE pattern. Expansion assumes uniform step → intermediate positions drift from extraction centroids. Traced example:

  4. 5 pipe fittings at X = [90.8, 103.6, 108.3, 133.8, 138.5]
  5. Detected as ROUTE:X:11.9:5 (avg step = (138.5 - 90.8) / 4 = 11.9m)
  6. Expansion places at: 90.8, 102.7, 114.6, 126.6, 138.5
  7. Actual positions: 90.8, 103.6, 108.3, 133.8, 138.5
  8. Error: up to 7.2m on intermediate elements (endpoints always match)

  9. SPRAY grid approximation. SPRAY uses median step with 10% tolerance. The grid approximation diverges further from actual positions than ROUTE.

Fidelity check mechanics

BomValidator.checkVerbExpansionFidelity() (step 9b) performs a round-trip:

  1. Read verb-factored BOM lines (verb_ref IS NOT NULL)
  2. Read extraction centroids from component_library.db
  3. Group both by storey|discipline|product (R9 fix — was storey|product)
  4. Expand verb_ref to positions, add floor AABB min for world coordinates
  5. Sort both sets, match positionally, compute Euclidean distance

The grouping key fix (R9) eliminated 993 count mismatches caused by mixing centroids from different discipline BOMs. The residual distance errors are real: ROUTE's uniform-step assumption vs non-uniform actual spacing (R8 TODO).

Coordinate chain

DisciplineBomBuilder writes:
  fMinX = MIN(minX) across all elements on storey (AABB min)
  dx = centroidX - fMinX                          (floor-relative)
  makeDx = fMinX - allMinX                        (floor origin vs building origin)

VerbDetector stores:
  origin = first_centroid - fMinX                  (pattern origin, floor-relative)
  step = (last - first) / (count - 1)             (uniform average)

Fidelity checker reconstructs:
  expanded = origin + i*step                       (floor-relative positions)
  world = expanded + floorAabbMin                  (world coordinates)
  compare against extraction centroids             (also world coordinates)

Error source: step is an average, not the actual per-element spacing.

Recurrence Analysis — Cross-Floor Pattern Sharing

Question: Can the TE BOM be further compressed via recurrence — reusable sub-BOMs that appear identically on multiple floors?

Observation: The same product types repeat across floors: - Poly Steel pipes appear on all 7 floors - UPVC pipes on 6 floors - Sprinkler heads (K80) on 5 floors - Light fixtures (600x600) on 5 floors

Current state: 1,131 LEAF lines (42.8:1 via CLUSTER). Each floor's discipline BOMs are independent — FP_TE_GF and FP_TE_L01 both contain sprinkler lines but share no sub-BOM reference.

Recurrence candidates:

Pattern Floors Elements/floor Potential sub-BOM
FP sprinkler grid 5 (GF-L3) ~900 Sprinkler bay template + M_BOM_Line offset per floor
ELEC lighting grid 5 (GF-L3) ~160 Ceiling light template + M_BOM_Line offset per floor
ACMV duct run 4 (GF-L2) ~300 Duct main template with branch variants
CW pipe riser 7 (all) ~100 Vertical riser template (per-floor instance attrs)

Challenge: Floors are NOT identical — element counts vary (GF=3,513 vs L4=2,307), spacing differs, building footprint tapers. True typical-floor recurrence requires that the sub-BOM pattern (product set + relative offsets) matches exactly.

Approach: Investigate whether discipline sub-BOMs on adjacent floors share the same product-set signature (ignoring absolute position). If yes, a template sub-BOM with per-floor M_BOM_Line offset placement compresses further. If not, the current per-floor independent BOMs are correct and 42.8:1 is the natural compression limit.

This is a future investigation — does not block current compilation. The spec (BOMBasedCompilation.md §3.3) supports recurrence via M_Product_Category_Line templates.



TE Compilation Status — Post-Flatten BOM Audit (S100-p69, 2026-03-28)

TE BOM persisted. The S100-p66 flatten (BUILDING→FLOOR→LEAF) resolved W-TACK-1 and W-BUFFER-1. TE_BOM.db is now populated. QA all PASS. Compile-path blockers remain — TE is extraction-only until prompt 71.

IFCtoBOM QA — All PASS (post-flatten)

Check Result
BOM count 8 (1 BUILDING + 7 FLOOR)
BOM lines 1,522 lines → 48,435 instances
DocType (CAT/DST) CO_TE
BOM categories CO=1, FN=1, GF=1, L1=1, L2=1, L3=1, L4=1, RF=1
M_Product 513 total (505 catalog + 8 assembly stubs)
Duplicate bom_ids 0
Orphan lines 0
Duplicate positions 2 (WARN — non-blocking)
World-coord offsets (>500m) 0
Non-zero BOM origins 0
AABB W containment Floor max 68,930 ≤ building 73,670
AABB D containment Floor max 56,359 ≤ building 59,124
AABB H envelope Building 59,818, floor sum 121,249 (103% overlap)
Tack: assembly refs valid 7 refs, all valid
Tack: BUILDING children 7 assembly refs
Element refs on LEAF lines 1,515/1,515
W-TACK-1: LBD convention 0/1,515 overflows — PASS
W-BUFFER-1: SUM(children) = parent SKIP (no SET BOMs after flatten)
Product-linked LEAF lines 1,515/1,515
Factorization ratio 3.0× lines, 95.9× reuse (505 products → 48,428 instances)
Extraction reconciliation 48,428 vs 48,428 (delta=+0)
Shape consistency (CP-4) 1,515 LEAF rows classified
Integrity hash a631bd7864567996

Note: W-TACK-1 and W-BUFFER-1 pass trivially after flatten, not by correctness. With no SET BOMs, every LEAF child sits under FLOOR whose AABB is the union of all its elements. W-TACK-1 passes because parent = union(children) by construction. W-BUFFER-1 is skipped because no SET BOMs exist to check. This does not prove tack offsets are compile-ready — only that the flat structure satisfies the validator.

BOM Structure (post-flatten)

BUILDING_TE_STD  (73670×59124×59818mm, origin=84.64/-51.22/-30.69)
├── TE_FDN  (Foundation)  — 157 lines,    703 instances, 6 IFC classes
├── TE_GF   (Ground Floor) — 421 lines,  3,513 instances, 24 IFC classes
├── TE_L01  (Level 1)     — 223 lines,  2,070 instances, 21 IFC classes
├── TE_L02  (Level 2)     — 222 lines,  2,609 instances, 23 IFC classes
├── TE_L03  (Level 3)     — 257 lines,  1,798 instances, 23 IFC classes
├── TE_L04  (Level 4)     — 125 lines,  2,307 instances, 21 IFC classes
└── TE_RF   (Roof)        — 110 lines, 35,428 instances, 23 IFC classes

FLOOR origins are all (0,0,0). Tack chain: BUILDING.origin + MAKE.dx + LEAF.dx = element LBD. Root→child MAKE offsets are LBD-to-LBD (meters, relative to building min).

Verb Distribution

Verb Lines Instances % of total
CLUSTER 345 47,157 97.4%
(null/PLACE) 1,163 1,163 2.4%
FRAME 2 78 0.2%
ROUTE 2 18 <0.1%
TILE 3 12 <0.1%
Total 1,515 48,428

Verb factorization is heavily CLUSTER-dominated (97.4%). The 33,324 IfcPlate elements on the Roof (CP-4 archetype) drive this — they're identical panel products tiled across the roof surface.

Compile-Path Blockers (6 questions)

1. Root finding: Yes. Exactly 1 BOM has no parent m_bom_line pointing to it (BUILDING_TE_STD). The compile path can find it without bom_type='BUILDING'.

2. Verb coverage: 345/1,515 lines (47,157/48,428 instances = 97.4%) are verb-factored (CLUSTER/FRAME/ROUTE/TILE). 1,163 lines are PLACE (null verb_ref, qty=1). The compile path needs to handle both — verb lines expand to N instances via VerbExpander, PLACE lines emit 1:1.

3. Discipline grouping — BLOCKER: Discipline codes (ARC, STR, FP, etc.) are NOT persisted on LEAF lines. The role column stores IFC class name (IfcWall, IfcPipe, etc.), not discipline code. During IFCtoBOM, DisciplineBomBuilder groups elements by e.discipline() for verb factorization but passes e.ifcClass() to VerbFactorizer.insertLeafLine() as the role parameter. The compile path needs discipline for AD_Org_ID resolution. Fix: Either (a) add a discipline column to m_bom_line, or (b) derive discipline from IFC class via ad_ifc_class_map (ERP.db DV005, 46 rows). Option (b) is already the extraction path — but loses multi-model provenance (same IfcPipeSegment could be FP, CW, or SP depending on which federated model it came from).

4. Tack chain integrity: World position reconstruction (query 2g) produces coordinates matching extraction positions. Sample: 006_ADA_Countertop_and_Sink reconstructs to (128.41, -3.05, -14.65) — consistent with building envelope. Zero negative tacks (query 2f). Tack chain is geometrically sound.

5. Scale concerns: Largest floor = Roof with 110 lines (35,428 instances — dominated by 33,324 IfcPlate). Ground Floor has 421 lines (3,513 instances). These are manageable for BOM walk — the factored representation keeps line count under 500 per floor even at 48K total instances.

6. Missing data: - element_ref (IFC GUID): 1,515/1,515 — fully populated on unfactored PLACE lines. Verb-factored lines carry MA rows mapping qi→GUID. - material_name: 717/1,515 (47.3%) — partial. 798 lines have NULL material. - orientation: 42/1,515 (2.8%) — sparse. Most elements lack orientation data. - host_element_ref: column exists but not checked.

What Works Now (post-flatten)

Asset Status
TE_BOM.db Populated — 8 BOMs, 1,522 lines, committed
_TE_compile.db Prepared from TE_BOM.db
Compile PASS — DAGCompiler runs (but output = passthrough)
G1-G6 gates All PASS (still extraction-vs-extraction)
G0-COMPILED WARN — c_order=0 (extraction-only, expected)

What Doesn't Work (compile-path gaps)

Gap Current State What it Means
c_order 0 rows in output DB BomDropper runs but CO passthrough deleted (S100-p66)
c_orderline 0 rows in output DB No BOM explosion to order lines
Discipline on LEAF Missing — role=IFC class, not discipline AD_Org_ID unresolvable from BOM alone
CompilationPipeline CO skip Still present Creates empty BuildingSpec, all elements passthrough
Designer access Would show nothing No order data to browse

Why This Wasn't Flagged Earlier (historical, S99)

Three structural blind spots — two now fixed (S100-p67):

  1. Silent skip, not loud fail. ~~return 1 on missing BOM.db~~ FIXED (S100-p67): script now emits FAIL verdict + loud error.

  2. No "was it compiled?" gate. ~~G1-G6 compare extraction-vs-extraction.~~ FIXED (S100-p67): G0-COMPILED gate checks c_order > 0. TE correctly FAILs G0 (extraction-only, c_order=0).

  3. LAST_MILE_PROBLEM.md doesn't track per-building IFCtoBOM status. LMP tracks compilation pipeline gaps (R21-R24), not extraction-to-BOM conversion failures. R25 gap entry added (S100-p67).

  4. LMP §1 (Input=Output) has no compilation prerequisite. The count check runs on whatever output exists. Tracked as CP-5.

Code Flow — How TE Passes Through Unchallenged

The full class.method path showing exactly how TE's extraction coordinates reach output.db without compilation. Each step shows the Java/shell entry point, what TE produces at that step, and where the failure or bypass occurs.

STEP 1: EXTRACT (Python — external, not our code)
────────────────────────────────────────────────────
  Entry:  IfcOpenShell federation/extract.py
  Reads:  SJTII_Terminal.ifc (9 federated discipline models)
  Writes: component_library.db
          → I_Element_Extraction: 48,428 rows (active)
          → I_Geometry_Map: mesh vertices + faces
          → M_Product: 505 unique dimensional signatures
  TE:     ✅ PASS — complete, correct, the reference truth

STEP 2: CLASSIFY (YAML — human-authored)
────────────────────────────────────────────────────
  File:   IFCtoBOM/src/main/resources/classify_te.yaml
  Reads:  nothing (pure declaration)
  Declares: building_type=SJTII_Terminal, m_product_category=CO,
            8 disciplines, 7 storey bands, dsl_file
  TE:     ✅ PASS — correct

STEP 3: POPULATE (Java)
────────────────────────────────────────────────────
  Entry:  IFCtoBOMMain.main("--populate", "--classify", yaml)
          → ExtractionPopulator.populate(compConn, buildingType)
  Reads:  component_library.db (I_Element_Extraction)
  Writes: component_library.db (enriched: Z-band storey, is_active,
          M_Product_ID linkage)
  TE:     ✅ PASS — 48,428 active elements, REBAR deactivated

STEP 4: BUILD BOM (Java IFCtoBOM)
────────────────────────────────────────────────────
  Shell:  run_RosettaStones.sh:710
          → mvn exec:java -Dexec.mainClass="com.bim.ifctobom.IFCtoBOMMain"
  Entry:  IFCtoBOMMain.main("--classify", yaml, "--bom-db", "library/TE_BOM.db")
          → IFCtoBOMPipeline.run(yamlPath, bomDbPath, compDbPath, schemaPath)

  Step 4a — ClassificationYaml.load(yamlPath)                    → ✅ PASS
  Step 4b — ExtractionPopulator.populate(compConn, buildingType)  → ✅ PASS (48,428)
  Step 4c — IFCtoBOMPipeline:258
            if ("CO".equals(config.docBaseType()))
              → DisciplineBomBuilder.build(bomConn, config, storeyElements)
            58 BOMs, 1,572 lines, 48,485 instances                → ✅ PASS (in memory)
  Step 4d — BomValidator.validateAndReport(bomConn, ...)          → ❌ FAIL (5 checks)
            DocType: "-_TE" (format bug)                            → prompt 65
            M_Product catalog: 0 (CO path skips registration)      → prompt 65
            Non-zero origin: 1 BOM                                 → prompt 65
            W-TACK-1: 471/1,515 overflows                          → prompt 66
            W-BUFFER-1: 36/50 unbalanced                           → prompt 66
  Step 4e — bomConn.rollback() + throw SQLException               → TE_BOM.db = EMPTY

STEP 5: PREPARE COMPILE DB (Shell)
────────────────────────────────────────────────────
  Entry:  run_RosettaStones.sh:133 prepare_compile_db()
  Line 146: if [ ! -f "$bom_db" ]; then return 1
  TE:     ⚠️ SILENT SKIP — TE_BOM.db missing, returns 1
          Script continues. No verdict logged. No FAIL.

STEP 5b: BOM DROP (Java — NEVER REACHED for TE)
────────────────────────────────────────────────────
  Entry:  BuildingRegistryTest.java:78
          → BomDropper.drop(compileDb, entry)
  What it does: Walks m_bom/m_bom_line → creates C_Order + C_OrderLine tree
  TE:     ❌ NEVER RUNS — no compile DB → no test invocation
          c_order = 0, c_orderline = 0

STEP 6: COMPILE (Java DAGCompiler — 12-stage pipeline)
────────────────────────────────────────────────────
  Shell:  run_RosettaStones.sh:210 compile_building()
          → mvn test -Dtest="BuildingRegistryTest" -Dbom.db="${compile_db}"
  Entry:  BuildingRegistryTest → CompilationPipeline.run(entry)

  The 12 stages (CompilationPipeline.java:56-66):
    Stage 1: MetadataValidator    — referential integrity
    Stage 2: ParseStage           — DSL → BuildingDefinition
    Stage 3: CompileStage         — compile → BuildingSpec
    ┌─────────────────────────────────────────────────────────────┐
    │ ❌ ILLICIT CODE — CompilationPipeline.java:352-354          │
    │                                                             │
    │   if ("CO".equals(ctx.entry().mProductCategoryId())) {      │
    │       ctx.setSpec(new BuildingSpec(name, List.of(), null));  │
    │       return true;  // SKIP                                 │
    │   }                                                         │
    │                                                             │
    │ Creates EMPTY BuildingSpec (0 storeys, no roof).             │
    │ Violations:                                                 │
    │   Anti-Drift §1 — magic coordinates                        │
    │   DriftGuardTest D6 — hardcoded category branch             │
    │   LMP §7 — input = output                                  │
    │ Fix: DELETE this block (prompt 66 Step 6)                   │
    └─────────────────────────────────────────────────────────────┘
    Stage 4: TemplateStage        — ST mode only (skipped)
    Stage 5: WriteStage           — write to output.db
    ┌─────────────────────────────────────────────────────────────┐
    │ ❌ PASSTHROUGH — BuildingWriter.java:865                    │
    │   emitGlobalPlacementElements(spec)                         │
    │                                                             │
    │ PlacementLoader.load()                                      │
    │   → hasOrderLineData() = false (c_order=0)                  │
    │   → loadFromBOM()                                           │
    │     → MBOM.getRoots(conn)                                    │
    │     → BOMWalker.walkSelf(bomId, visitors, buildingType)     │
    │     → PlacementCollectorVisitor.getPlacements()              │
    │       → 48,428 Placement records with extraction coords     │
    │                                                             │
    │ BuildingSpec has 0 storeys → nothing consumed by             │
    │ StoreyCompiler → PlacementLoader.isConsumed() = false       │
    │ for ALL elements → emitGlobalPlacementElements() emits      │
    │ ALL 48,428 as-is → extraction coords copied to output       │
    └─────────────────────────────────────────────────────────────┘
    Stage 6: VerbStage            — BIM COBOL (runs but no effect)
    Stage 7: DigestStage          — spatial digest
    Stage 8: GeometryStage        — geometry integrity
    Stage 9: ProveStage           — placement proofs

  TE output: 48,428 elements with extraction coordinates.
  No BOM compilation occurred. Output = input.

STEP 7: VERIFY (Shell + Java gates)
────────────────────────────────────────────────────
  Shell:  run_RosettaStones.sh:732 run_integrity()
  Java:   RosettaStoneGateTest (G1-G6)

  G1-COUNT:      48,428 output = 48,428 reference  → PASS (trivially)
  G2-VOLUME:     same AABB as extraction            → PASS (trivially)
  G3-DIGEST:     same hash as extraction            → PASS (trivially)
  G4-TAMPER:     source scan (no DB dependency)     → PASS (real)
  G5-PROVENANCE: checks geometry resolution         → PASS (trivially)
  G6-ISOLATION:  cross-DB join guard                → PASS (real)

  TE:     ⚠️ All PASS — but G1/G2/G3/G5 are comparing a thing to itself.
          No G0-COMPILED gate exists to check c_order > 0.
          Fix: prompt 67

FINE Logging Recommendation

Once the fix prompts are applied, add FINE-level guards that would have caught this earlier:

// In CompilationPipeline.CompileStage.execute():
BIMLogger.fine("COMPILE", "CompileStage: category={}, storeys={}, hasRoof={}",
    ctx.entry().mProductCategoryId(),
    ctx.spec().storeys().size(),
    ctx.spec().roof() != null);

// In BuildingWriter.emitGlobalPlacementElements():
int totalPlacements = allPlacements.size();
int consumed = (int) allPlacements.stream()
    .filter(p -> PlacementLoader.getInstance().isConsumed(p.buildingType(), p.elementRef()))
    .count();
BIMLogger.fine("EMIT", "emitGlobal: total={}, consumed={}, emitting={}",
    totalPlacements, consumed, totalPlacements - consumed);
if (consumed == 0 && totalPlacements > 100) {
    BIMLogger.warn("EMIT", "[SUSPECT] 0/{} consumed — entire output is passthrough. "
        + "Was CompileStage skipped?", totalPlacements);
}

// In PlacementLoader.load():
BIMLogger.fine("PLACEMENT", "PlacementLoader: hasOrderLineData={}, path={}",
    hasOrderLineData() ? "OrderLine" : "BOM-direct",
    System.getProperty("bom.db"));

The [SUSPECT] warning at FINE level would flag any future building where emitGlobalPlacementElements() emits everything and nothing was consumed — the exact signature of the TE passthrough.

Fix Path — Priority Order (updated S100-p69)

Phase 1 — Mechanical fixes: DONE

# Fix Status
1 DocType format: -_TECO_TE DONE (S99)
2 Non-zero BOM origin: exclude BUILDING DONE (S100-p65)
3 M_Product catalog: register leaf products DONE (S100-p65)

Phase 2 — Tack convention: DONE (via flatten)

# Fix Status
4 W-TACK-1: 471→0 overflows DONE (S100-p66) — SET level removed, LEAF under FLOOR
5 W-BUFFER-1: 36 unbalanced DONE (S100-p66) — no SET BOMs to check (SKIP)

Phase 3 — Verification hardening: DONE

# Fix Status
6 Script fail-loud on missing BOM.db DONE (S100-p67)
7 G0-COMPILED gate DONE (S100-p67) — TE correctly FAILs (c_order=0)

Phase 4 — Compile-path enablement: DONE (S100-p71/p72)

# Fix Status
8 Discipline on LEAF lines DONE (S100-p71) — AD_Org resolves from product, not line
9 Remove CO passthrough DONE (S100-p72) — BOM walk replaces shouldSkip
10 BomDrop for CO buildings DONE (S100-p72) — single path, verb-dispatched

Phase 5 — iDempiere PK conformance (prompt 86):

# Fix Approach
11 m_bom M_BOM_ID INTEGER PK Phase A: DONE (S100-p86). 65 Java refs migrated. bom_idValue.
12 M_Product_Category INTEGER PK Phase B: DONE. 135 Java refs migrated. Category codes → Value.
13 13 AD tables INTEGER PKs Phase C: DONE. Composite PKs got surrogate _ID.

Why Phase 5 matters for TE: TE's 48,428 elements traverse 8 BOMs via bom_id TEXT. Every BOMWalker.walk(), BomDropper.explode(), and PlacementCollectorVisitor.onSubAssembly() passes bom_id as String. Migrating to INTEGER FK will: - Flush hidden string-concatenation assumptions in walker code - Expose hardcoded category codes ("RE", "CO") that should be Value lookups - Verify IFCtoBOM DDL matches the new schema (re-extraction test) - Prove the pipeline is PK-type-agnostic (same output, different key type)

Verification exercise: After each phase of prompt 86, run TE through the full pipeline. The FINE logs (prompt 85) will show whether INTEGER PKs flow correctly through BomDrop → BOM walk → WriteStage. SH is the canary (7/7 must hold). TE is the stress test (48K elements, 8 disciplines). Any TEXT/INTEGER mismatch will surface as a gate failure or exception in the FINE log — that's the point.

After Phase 5: All tables follow iDempiere convention. _ID is opaque INTEGER (never shown to users). Value is the search key. Name is the display name. FKs reference _ID. DB integrity enforced at the schema level.


Cross-references: BBC.md §1.8 | BOMBasedCompilation.md §3-§4 (governing spec) | InfrastructureAnalysis.md | terminal_erd.html (interactive ERD) | bim_architecture_viz.html (4-DB architecture) | LAST_MILE_PROBLEM.md (Gap 6: verb step-uniformity) | BIM_COBOL.md (verb taxonomy + data flow)