Hi Calcite devs, I’m exploring Apache Calcite as the SQL generator for ACBP, a policy compiler that turns categorical/boolean models into deterministic, auditable SQL artifacts (views/CTAS) with formal checks (soundness/coverage/dedup).
Public overview (no private IP): - Overview: https://dotkboy-web.github.io/acbp/ - Theorems: https://dotkboy-web.github.io/acbp/acbp_theorems.html - ACBP Equation: https://dotkboy-web.github.io/acbp/ACBP-Equation.html - DOI snapshot: https://zenodo.org/records/16891549 Why ACBP? (relevance to Calcite) - Realistic workload: CASE-heavy categorical logic over clean tables, no vendor-specific UDFs → ideal for multi-dialect unparse coverage. - Deterministic semantics: decisions must be identical across dialects → strong target for correctness/golden tests of RelToSql/dialects. - Governance-first: compiler proves properties (soundness/coverage/dedup) and keeps decisions auditable; portability matters for users running multiple engines. - Clean separation: ACBP owns the IR (categories/flags/rules); Calcite prints dialect SQL. Any gaps can become small PRs in sql.dialect. - Give back: happy to contribute a tiny HL7 suite of golden tests (input plan → expected strings) and open JIRAs for dialect gaps. Why this helps HL7/HIS (benefits & what’s novel) - Operational: deterministic routing and noise control (silencing windows, dedup, escalation ladders) encoded as SQL CASE/joins. - Governance/audit: soundness/coverage/dedup checks run in-DB; every action has explainable predicates + policy version for replay. - Engineering: one IR, many dialects (PostgreSQL/BigQuery/ClickHouse/Spark) via Calcite; DDL templated per dialect. - KPIs: fewer duplicate/low-value alerts, faster policy iteration with guardrails, and stable sub-second P95 on “what to do now” boards. - Novelty: not “rules in healthcare” per se―the combination of a SQL-native policy compiler + formal coverage/soundness/dedup + decision-space enumeration + multi-dialect generation is distinct from interface/rules engines. Goal Given an HL7-derived model (categories + flags + deterministic rules), build a Calcite plan once and unparse to multiple SQL dialects to produce “decision space” artifacts, preserving ACBP semantics. What “ACBP semantics” means here - Categories (finite enums) + derived boolean flags drive a single deterministic CASE that yields an action (throttle/alert/escalate/...). - The compiler guarantees governance properties and keeps decisions auditable in-DB. - SELECT/CASE is the portable core; DDL (CTAS/REPLACE) is wrapped by small templates per dialect. HL7 → categories/flags (representative mapping) - ADT: message_type='ADT', trigger_event ∈ {A01,A04,A08,A03}, patient_class ∈ {E,I,O} Flags: is_admission := trigger_event IN ('A01','A04'); is_discharge := trigger_event='A03'; is_emergency := patient_class='E'. - ORM: message_type='ORM', order_priority ∈ {STAT,ROUTINE}, order_type (LAB/RX/PROC) Flags: is_stat_order := order_priority='STAT'; is_critical_order := order_type IN ('LAB','PROC'). - ORU: message_type='ORU', obs_abn_flag (OBX-8: H/L/N), loinc_code Flags: abnormal_result := obs_abn_flag IN ('H','L'); critical_analyte := loinc_code IN ref_critical_loinc. MWE (shape representative of ACBP output) -- Schema (selected fields; staging joins handled upstream) CREATE TABLE hl7_messages ( msg_id BIGINT, event_ts TIMESTAMP, message_type VARCHAR, -- ADT, ORM, ORU trigger_event VARCHAR, -- A01, O01, R01, ... patient_class VARCHAR, -- PV1-2: E/I/O order_priority VARCHAR, -- STAT, ROUTINE loinc_code VARCHAR, obs_abn_flag VARCHAR -- OBX-8: H/L/N ); -- Deterministic policy (CASE over categories/flags): SELECT msg_id, event_ts, CASE WHEN (message_type = 'ORM' AND order_priority = 'STAT' AND obs_abn_flag IN ('H','L')) THEN 3 -- escalate WHEN (message_type = 'ORU' AND obs_abn_flag IN ('H','L') AND loinc_code IN (SELECT code FROM ref_critical_loinc)) THEN 3 WHEN (message_type = 'ORU' AND obs_abn_flag IN ('H','L')) THEN 2 -- alert WHEN (message_type = 'ADT' AND trigger_event IN ('A01','A04') AND patient_class = 'E') THEN 2 ELSE 1 -- throttle END AS action_id FROM hl7_messages WHERE event_ts >= CURRENT_TIMESTAMP - INTERVAL '2' DAY; Smallest viable integration surface 1) Build relational algebra with RelBuilder (scan/filter/project/CASE; no non-portable ops). 2) Convert RelNode → SqlNode via RelToSqlConverter. 3) Unparse with SqlDialect + SqlPrettyWriter per target dialect. 4) Keep DDL templated per dialect (CTAS/OR REPLACE varies); the SELECT is the portable core. Notes on portability - I can express the time filter as DATEADD/TIMESTAMP_SUB if a dialect prefers that form. - I avoid bitwise logic in the core SELECT; boolean flags are explicit predicates. - Packed masks (if needed for analytics/storage) can be materialized as a follow-up step per dialect. Questions 1) Is RelBuilder → RelToSqlConverter → SqlDialect the recommended path for multi-dialect generation today? Any caveats? 2) For interval arithmetic, would you represent a generic DATEADD/TIMESTAMP_SUB in the IR and let SqlDialect map it? 3) Guidance for ensuring CASE-heavy categorical logic unparses idiomatically across Postgres/BigQuery/ClickHouse/Spark? 4) If we hit a dialect gap, what’s the smallest PR you prefer (tests + changes in org.apache.calcite.sql.dialect.*)? I can provide a tiny repo with: - input plan construction (RelBuilder), - expected SQL strings per dialect (golden tests), - and JIRAs for any concrete gaps we uncover. Thanks, Muteb (DotK) Asia/Riyadh Note: I’m awaiting subscription confirmation; please keep me CC’d on replies.