Good question - thanks for asking. The short answer is that measures that span tables work, but they aren’t quite as convenient as measures in a single table.
When designing Measures in SQL I knew that measures sometimes span tables - star schemas are a case in point - but decided to make measures belong to one table for simplicity. This allowed me to put them into the same namespace as columns, so I can write ’select e.avg_sal from emp as e’. (Measures are not columns, but SQL semantic analyzers already know how to look up a column within the scope of a table.) My goal was to allow dimensional expressions, such as revenue-this-year minus revenue-last-year, so we need to know what are the dimensions that affect the value of a measure. By associating a measure with a table we can say that the (non-measure) columns of that table are its dimensions. One case of 'measures spanning tables’ is joining a table with a measure to a regular table. For example, suppose the orders table has an avg_shipping_cost measure, and we write this query: select o.zipcode, o.avg_shipping_cost, count(distinct i.color) from orders as o join orderItems as i on o.id <http://o.id/> = i.orderId group by o.zipcode If I’d written avg(o.shipping_cost), orders with many items would be weighted more heavily in the total than orders with few items. But measures are ‘locked’ to the grain of their table. This is a crucial property of measures. It allows me to create a wide join view over a star schema - say of orders, order-items, customers and products - and the measures in that view will be well-behaved. Users get the benefits of the join without having to remember the right join conditions. By the way, if I really want a weighted average shipping, I can unpack convert the shipping-cost measure into a value, join it to the order-items, and then re-package it as a measure again: select o.zipcode, avg(o.avg_shipping_cost) as measure weighted_avg_shipping_cost, count(distinct i.color) from orders as o join orderItems as i on o.id <http://o.id/> = i.orderId group by o.zipcode Now, another case of ‘measures spanning tables’ is common dimensions (also known as conformed dimensions). Let’s suppose I have an orders table and an inventory table, both with measures, and both with a date column (orderDate and inventoryDate) and a reference to a product. I can write a query that joins these tables: select o.sum_revenue, v.total_on_hand from orders as o join inventory as v on o.orderDate = v.inventoryDate and o.productId = v.productId where o.color = ‘Red' This is a many-to-many join — like joining two cubes in an OLAP system — but the measures’ grain-locking ensures prevents double-counting. Notice that I have had to manually put the common dimensions into a join condition. There is nothing in the definition of the orders.sum_revenue and inventory.total_on_hand measures that connects their date dimensions. I have to remember to make that connection. Where to store those definitions? I don’t know. I don’t think it would be a good idea to build a view on this many-to-many join (too many columns, too much complexity). One idea is to store these common dimensions in the semantic layer. Perhaps a 'semantic layer' should have entities like ‘product’, ‘date’, ‘customer’ and record which columns on the ‘cube’ tables connect to each entity. And then the semantic layer can help people to write SQL. In my opinion, measures are not really the semantic layer — they are already there, as part of the tables, but their existence means that the semantic layer doesn’t have to worry about grain and double-counting, and so can generate simpler SQL. Julian > On Sep 30, 2024, at 3:00 PM, Barak Alon <barak.s.a...@gmail.com> wrote: > > Hey there - > > I've found my way here via Measures in SQL > <https://dl.acm.org/doi/pdf/10.1145/3626246.3653374>. I'm a long time > admirer of Calcite, but I'm not very familiar with internals - apologies if > I stumble. > > I work on Airbnb's Minerva > <https://medium.com/airbnb-engineering/how-airbnb-achieved-metric-consistency-at-scale-f23cc53dea70> > project, > an internal semantic layer for star schemas. Measures in SQL is a > wonderfully elegant idea, and I've started exploring if we can introduce > them to our system. > > However, our users often write queries that span multiple tables/stars, > requiring a drill across operation. Our system handles this by exposing a > virtual table that pretends as if our entire semantic layer is a single, > wide fact table. Incompatible measure/dimension combinations result in > runtime errors. > > This need feels pretty common and a logical extension to some of Measures' > strengths - making SQL more powerful and expressive for "top down" queries. > > I'm curious - are there plans to support measures that span multiple tables > in Calcite? > > - Barak