Hi Calcite devs,

I’m exploring Apache Calcite as the SQL generator for ACBP, a policy compiler 
that turns categorical/boolean models into deterministic, auditable SQL 
artifacts (views/CTAS) with formal checks (soundness/coverage/dedup).

Public overview (no private IP):
- Overview: https://dotkboy-web.github.io/acbp/
- Theorems: https://dotkboy-web.github.io/acbp/acbp_theorems.html
- ACBP Equation: https://dotkboy-web.github.io/acbp/ACBP-Equation.html
- DOI snapshot: https://zenodo.org/records/16891549

Why ACBP? (relevance to Calcite)
- Realistic workload: CASE-heavy categorical logic over clean tables, no 
vendor-specific UDFs → ideal for multi-dialect unparse coverage.
- Deterministic semantics: decisions must be identical across dialects → strong 
target for correctness/golden tests of RelToSql/dialects.
- Governance-first: compiler proves properties (soundness/coverage/dedup) and 
keeps decisions auditable; portability matters for users running multiple 
engines.
- Clean separation: ACBP owns the IR (categories/flags/rules); Calcite prints 
dialect SQL. Any gaps can become small PRs in sql.dialect.
- Give back: happy to contribute a tiny HL7 suite of golden tests (input plan → 
expected strings) and open JIRAs for dialect gaps.

Why this helps HL7/HIS (benefits & what’s novel)
- Operational: deterministic routing and noise control (silencing windows, 
dedup, escalation ladders) encoded as SQL CASE/joins.
- Governance/audit: soundness/coverage/dedup checks run in-DB; every action has 
explainable predicates + policy version for replay.
- Engineering: one IR, many dialects (PostgreSQL/BigQuery/ClickHouse/Spark) via 
Calcite; DDL templated per dialect.
- KPIs: fewer duplicate/low-value alerts, faster policy iteration with 
guardrails, and stable sub-second P95 on “what to do now” boards.
- Novelty: not “rules in healthcare” per se―the combination of a SQL-native 
policy compiler + formal coverage/soundness/dedup + decision-space enumeration 
+ multi-dialect generation is distinct from interface/rules engines.

Goal
Given an HL7-derived model (categories + flags + deterministic rules), build a 
Calcite plan once and unparse to multiple SQL dialects to produce “decision 
space” artifacts, preserving ACBP semantics.

What “ACBP semantics” means here
- Categories (finite enums) + derived boolean flags drive a single 
deterministic CASE that yields an action (throttle/alert/escalate/...).
- The compiler guarantees governance properties and keeps decisions auditable 
in-DB.
- SELECT/CASE is the portable core; DDL (CTAS/REPLACE) is wrapped by small 
templates per dialect.

HL7 → categories/flags (representative mapping)
- ADT: message_type='ADT', trigger_event ∈ {A01,A04,A08,A03}, patient_class ∈ 
{E,I,O}
  Flags: is_admission := trigger_event IN ('A01','A04'); is_discharge := 
trigger_event='A03'; is_emergency := patient_class='E'.
- ORM: message_type='ORM', order_priority ∈ {STAT,ROUTINE}, order_type 
(LAB/RX/PROC)
  Flags: is_stat_order := order_priority='STAT'; is_critical_order := 
order_type IN ('LAB','PROC').
- ORU: message_type='ORU', obs_abn_flag (OBX-8: H/L/N), loinc_code
  Flags: abnormal_result := obs_abn_flag IN ('H','L'); critical_analyte := 
loinc_code IN ref_critical_loinc.

MWE (shape representative of ACBP output)
-- Schema (selected fields; staging joins handled upstream)
CREATE TABLE hl7_messages (
  msg_id BIGINT,
  event_ts TIMESTAMP,
  message_type VARCHAR,     -- ADT, ORM, ORU
  trigger_event VARCHAR,    -- A01, O01, R01, ...
  patient_class VARCHAR,    -- PV1-2: E/I/O
  order_priority VARCHAR,   -- STAT, ROUTINE
  loinc_code VARCHAR,
  obs_abn_flag VARCHAR      -- OBX-8: H/L/N
);

-- Deterministic policy (CASE over categories/flags):
SELECT
  msg_id,
  event_ts,
  CASE
    WHEN (message_type = 'ORM' AND order_priority = 'STAT'
          AND obs_abn_flag IN ('H','L')) THEN 3                    -- escalate
    WHEN (message_type = 'ORU' AND obs_abn_flag IN ('H','L')
          AND loinc_code IN (SELECT code FROM ref_critical_loinc)) THEN 3
    WHEN (message_type = 'ORU' AND obs_abn_flag IN ('H','L')) THEN 2 -- alert
    WHEN (message_type = 'ADT' AND trigger_event IN ('A01','A04')
          AND patient_class = 'E') THEN 2
    ELSE 1                                                          -- throttle
  END AS action_id
FROM hl7_messages
WHERE event_ts >= CURRENT_TIMESTAMP - INTERVAL '2' DAY;

Smallest viable integration surface
1) Build relational algebra with RelBuilder (scan/filter/project/CASE; no 
non-portable ops).
2) Convert RelNode → SqlNode via RelToSqlConverter.
3) Unparse with SqlDialect + SqlPrettyWriter per target dialect.
4) Keep DDL templated per dialect (CTAS/OR REPLACE varies); the SELECT is the 
portable core.

Notes on portability
- I can express the time filter as DATEADD/TIMESTAMP_SUB if a dialect prefers 
that form.
- I avoid bitwise logic in the core SELECT; boolean flags are explicit 
predicates.
- Packed masks (if needed for analytics/storage) can be materialized as a 
follow-up step per dialect.

Questions
1) Is RelBuilder → RelToSqlConverter → SqlDialect the recommended path for 
multi-dialect generation today? Any caveats?
2) For interval arithmetic, would you represent a generic DATEADD/TIMESTAMP_SUB 
in the IR and let SqlDialect map it?
3) Guidance for ensuring CASE-heavy categorical logic unparses idiomatically 
across Postgres/BigQuery/ClickHouse/Spark?
4) If we hit a dialect gap, what’s the smallest PR you prefer (tests + changes 
in org.apache.calcite.sql.dialect.*)?

I can provide a tiny repo with:
- input plan construction (RelBuilder),
- expected SQL strings per dialect (golden tests),
- and JIRAs for any concrete gaps we uncover.

Thanks,
Muteb (DotK)
Asia/Riyadh

Note: I’m awaiting subscription confirmation; please keep me CC’d on replies.

Reply via email to