Hi Calcite community, I recently asked a related question on [CALCITE-4496][1]. I would like to bring the discussion to the dev list because the question is not only about one JIRA item or implementation detail, but about the abstraction boundary of Measure in SQL itself.
Over the past quarter, I have been exploring semantic layer / semantic modeling DSL design. During this process, I looked at adjacent systems and ideas such as [Looker][3], [Cube][4], [Malloy][5], [dbt Semantic Layer / MetricFlow][6], and Calcite’s measure work. These systems all point to the same need: reusable business calculations and governed analytical models. But they draw the boundary differently. Some keep measures in an external semantic model; Calcite’s SQL measure work moves reusable calculations into SQL itself. That boundary is the part I would like to discuss. I read [CALCITE-4496][1] and Julian Hyde / John Fremlin’s paper [“Measures in SQL”][2]. The core motivation makes sense to me. Traditional SQL views cannot preserve reusable aggregate calculations very well. Once a view turns `AVG`, ratios, margins, or similar calculations into ordinary columns, downstream roll-ups can lose the original calculation semantics. The classic example is “average of averages”. `MEASURE` tries to solve this by making a measure column not just a materialized value, but a calculation that can be re-evaluated in the outer query context. This is a powerful idea. My question is about where this abstraction should live. In traditional SQL, a view can be understood as a relation. It produces rows and columns; columns are values. Even if the view definition is complex, downstream queries can treat it as a black-box relation. A measure-bearing view is different. Some columns are regular row values, while some columns are deferred aggregate calculations that can be re-evaluated in an outer query context. The view is no longer only a relation in the traditional value-level sense; it also carries reusable calculation semantics. That leads to a few questions. 1. Is a measure-bearing view still best understood as a relation, or as a relation plus semantic calculation metadata? In semantic modeling systems, dimensions and measures are different kinds of objects. A dimension describes rows and can be selected, grouped, filtered, or displayed. A measure describes how to compute an aggregate over a set of rows, and its value depends on query context. If SQL schema contains both regular columns and measure columns, users and planners need to understand that they are not the same kind of column. Is this the intended mental model for SQL tables with measures? 2. Should `MEASURE` live in SQL schema/plans, or remain external semantic metadata expanded before planning? For semantic layers built on Calcite, there seem to be two possible directions: ```text semantic model -> SQL with MEASURE -> Calcite expands / optimizes measures ``` or: ```text semantic model -> ordinary SQL -> Calcite optimizes relational SQL ``` The first direction gives SQL a native reusable-calculation abstraction. The second keeps SQL closer to the traditional relational model, and leaves semantic complexity in the modeling layer. I am trying to understand which direction better matches Calcite’s long-term goal. 3. If `MEASURE` lives in SQL, how broad should its evaluation context become? The paper defines measures using context-sensitive expressions, evaluation context, and the `AT` operator. That model is expressive. In real semantic layer systems, query context often includes more than `GROUP BY`. It may include: - filters before aggregation vs filters after aggregation; - joins that may change grain; - time dimensions, time grain, and timezone rules; - hidden dimensions or filter-only fields; - policy filters and query parameters; - derived views that preserve or reshape dimensionality. Should SQL `MEASURE` stay focused on reusable aggregate calculations and their evaluation context, while these broader semantic-layer concepts remain external metadata? Or is the long-term direction for SQL measures to cover more of this modeling context? For context only, here is a design note from my semantic modeling exploration. It is not a Calcite syntax proposal; it is only background for why I am thinking about this boundary: [Designing a Semantic Modeling DSL][7] I would appreciate feedback on this tradeoff: is `MEASURE` the right abstraction boundary for SQL, or should measures remain primarily a semantic-layer concept that compiles down to ordinary SQL? References: [1]: https://issues.apache.org/jira/browse/CALCITE-4496 [2]: https://arxiv.org/pdf/2406.00251 [3]: https://cloud.google.com/looker/docs/reference/param-explore [4]: https://cube.dev/docs/product/data-modeling/concepts [5]: https://docs.malloydata.dev/documentation [6]: https://docs.getdbt.com/docs/use-dbt-semantic-layer/dbt-sl [7]: https://github.com/caicancai/semantic-modeling-dsl/blob/main/blog/designing-a-semantic-modeling-dsl-en.md
