Hi Yi, In this specific case ordering is declared in the schema. Quoting from Calcite documentation
~~~~~~~~~~~~~~~~~~~~ Monotonic columns need to be declared in the schema. The monotonicity is enforced when records enter the stream and assumed by queries that read from that stream. We recommend that you give each stream a timestamp column called rowtime, but you can declare others, orderId, for example. ~~~~~~~~~~~~~~~~~~~~ If we can propagate this ordering information to LogicalAggregate then we can easily handle this. As I understand required information is accessible to Calcite query planner. But in our case we need this information after we get the query plan from Calcite. AFAIK, current API doesn't provide a way to get this information in scenarios like above where ORDER BY is not specified in the query (I am not 100% sure about ORDER BY case too. I need to have a look at a query plan generated for a query with ORDER BY). Thanks Milinda On Fri, Jun 26, 2015 at 2:30 PM, Yi Pan <nickpa...@gmail.com> wrote: > Hi, Milinda, > > I thought that in your example, the ordering field is given in GROUP BY. > Are we missing a way to pass the ordering field(s) to the LogicalAggregate? > > -Yi > > On Fri, Jun 26, 2015 at 10:49 AM, Milinda Pathirage <mpath...@umail.iu.edu > > > wrote: > > > Hi Julian, > > > > Even though this is a general question across all the streaming > aggregates > > which utilize GROUP BY clause and a monotonic timestamp field for > > specifying the window, but I am going to stick to most basic example > (which > > is from Calcite Streaming document). > > > > SELECT STREAM FLOOR(rowtime TO HOUR) AS rowtime, > > productId, > > COUNT(*) AS c, > > SUM(units) AS units > > FROM Orders > > GROUP BY FLOOR(rowtime TO HOUR), productId; > > > > I was trying to implement an aggregate operator which handles tumbling > > windows via the monotonic field in GROUP By clause in addition to the > > general aggregations. I went in this path because I thought integrating > > windowing aspects (at least for tumbling and hopping) into aggregate > > operator will be easier than trying to extract the window spec from the > > query plan for a query like above. But I hit a wall when trying to figure > > out trigger condition for emitting aggregate results. I was initially > > planning to detect new values for FLOOR(rowtime TO HOUR) and emit current > > aggregate results for previous groups (I was thinking to keep old groups > > around until we clean them up after a timeout). But when trying to > > implement this I figured out that I don’t know how to check which GROUP > BY > > field is monotonic so that I only detect new values for the monotonic > > field/fields, not for the all the other fields. I think this is not a > > problem for tables because we have the whole input before computation and > > we wait till we are done with the input before emitting the results. > > > > With regards to above can you please clarify following things: > > > > - Is the method I described above for handling streaming aggregates make > > sense at all? > > - Is there a way that I can figure out which fields/expressions in > > LogicalAggregate are monotonic? > > - Or can we write a rule to annotate or add extra metadata to > > LogicalAggregate so that we can get monotonic fields in the GROUP By > clause > > > > Thanks in advance > > Milinda > > > > > > -- > > Milinda Pathirage > > > > PhD Student | Research Assistant > > School of Informatics and Computing | Data to Insight Center > > Indiana University > > > > twitter: milindalakmal > > skype: milinda.pathirage > > blog: http://milinda.pathirage.org > > > -- Milinda Pathirage PhD Student | Research Assistant School of Informatics and Computing | Data to Insight Center Indiana University twitter: milindalakmal skype: milinda.pathirage blog: http://milinda.pathirage.org