Thanks Danny for your summary!

For question one, I prefer the ordering of parameter in option two.
However, for argument names, I think it might be better to
1. name the window size explicitly by "window_size".
2. rename the hop_size to emit_frequency. Because that's what this
argument tries to say: e.g. emit in every x time_unit.
3. don't use "table" as the first argument name, because "table" is a
keyword. So we can replace it by "data"

so it might be
TUMBLE(data, DESCRIPTOR(time_column), window_size[, offset])
HOP(data, DESCRIPTOR(time_column), window_size, emit_frequency[, offset])

Meanwhile, one thing worths mentioning is, like Viliam always discussed in
CALCITE-4000[1], we can also finalize the data types for each signature:
TUMBLE(
data TABLE,
time_column DESCRIPTOR,
window_size INTERVAL DAY TO SECOND,
offset INTERVAL DAY TO SECOND DEFAULT INTERVAL '0' SECONDS)

HOP(
data TABLE,
time_column DESCRIPTOR,
window_size INTERVAL DAY TO SECOND,
emit_frequency INTERVAL DAY TO SECOND,
offset INTERVAL DAY TO SECOND DEFAULT INTERVAL '0' SECONDS)

So those intervals have to be INTERVAL DAY TO SECOND, not INTERVAL YEAR TO
MONTH, because year to month varies in seconds.


Regarding question two, I tried to dig into codebase and I think it could
be a RexCorrelVariable to replace that table_name, so it becomes
table_function(RexCorrelVariable, DESCRIPTOR($1), internal:INTERVAL MINUTE)

I know RexCorrelVariable might be not designed for this purpose, but I
find RexCorrelVariable is really useful because:
1. It can provide an id, to indicate which input of TableFunctionScan that
this table_function is applied on. Note that TableFunctionScan might not
have one input.
2. It can provide RelDataType, to save the referenced input's schema. The
input schema is useful during operator argument type validation (for
DESCRIPTOR) and operator return type inference.


[1]:
https://jira.apache.org/jira/browse/CALCITE-4000?focusedCommentId=17107665&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17107665


-Rui

On Thu, Jun 4, 2020 at 6:02 AM Danny Chan <yuzhao....@gmail.com> wrote:

> Hi, all, in the last 2 releases, Rui Wang has contributed the window table
> functions to replace the legacy group window syntax, the idea comes from
> the SQL-2016 polymorphic table functions. But the current implementation
> also got some feedbacks(or confusion), this thread tries to have a
> discussion on theses questions that are undecided.
>
> One is about the window function argument names:
>
> === option1
> TUMBLE(table, DESCRIPTOR(datetime), interval, offset)
> HOP(table, DESCRIPTOR(time_column), slide, size)
>
> === option2
> TUMBLE(table, DESCRIPTOR(time_column), size, offset)
> HOP(table, DESCRIPTOR(time_column), size, hop_size, offset)
>
> I would prefer for option2 for the reasons pointed out by Viliam in
> CALCITE-3737.
>
>
> Another is about how to translate the query
>
> For query, select * from TUMBLE(table, DESCRIPTOR(time_column), interval);
>
> Currently our outputs plan is:
>
> LogicalProject(ORDERID=[$0], ROWTIME=[$1], window_start=[$2],
> window_end=[$3])
> LogicalTableFunctionScan(invocation=[TUMBLE(DESCRIPTOR($1),
> interal:INTERVAL MINUTE)], rowType=[RecordType(INTEGER ORDERID,
> TIMESTAMP(0) time_column, TIMESTAMP(0) window_start, TIMESTAMP(0)
> window_end)])
> LogicalProject(ORDERID=[$0], ROWTIME=[$1])
> LogicalTableScan(table=[[CATALOG, SALES, SHIPMENTS]])
>
> But this seems not right, because the "table table_name" argument is not a
> relational expression, in CALCITE-3955, Julian has pointed out that we
> should translate the "table" argument as a CURSOR (or TABLE by Viliam), but
> another question is how to translate the referenced name "time_column", to
> a correlate variable ? which is also confusing because there is no
> correlation in the plan actually.
>
> Any suggestions are appreciated !
>
>
> [1] https://issues.apache.org/jira/browse/CALCITE-3955
> [2] https://issues.apache.org/jira/browse/CALCITE-3737
>
> Best,
> Danny Chan
>

Reply via email to