[ https://issues.apache.org/jira/browse/FLINK-5654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15949717#comment-15949717 ]
ASF GitHub Bot commented on FLINK-5654: --------------------------------------- Github user rtudoran commented on the issue: https://github.com/apache/flink/pull/3641 @fhueske Thanks for the feedback - i can of course do the modifications you mentioned. However, I do not believe that is the correct behavior (or better said not for all the needed cases). From my understanding of the semantic for OVER - even if the function would work on the time (proctime/eventtime) - we should still emit a value for every incoming event. I have 2 arguments for this: 1) in our scenarios - we would use the current implementation for example to detect certain statistics for every incoming event, while the statistic focus is defined for a certain period of time (this is a functionality that is highly needed). For example if you apply this in a stock market scenario - you might want to say give me the sum of the transactions over the last hour (to verify potentially a threshold for liquidity of the market) and as the application would need to react on each incoming transaction (e.g. decide to buy or not to buy) - then working on the behavior you mentioned would not enable such a scenario. More than this, even if you would need both behaviors ..then what query could you write to have the described behavior and make the differentiation from the other? 2) if you think on the case of event time - which should be similar with proctime - then there it should be the same. When you get an event (ev1 , time1) - you should not emit this output until you would know if there is at some point later another event with the same event time (ev2, time1). Basically you would register the timer for the acceptable watermark/allowedlatency and accumulate the accumulator for a specific event time and emit it after the allowed latency has passed....is this the actual behavior that is implemented / would be implemented? > Add processing time OVER RANGE BETWEEN x PRECEDING aggregation to SQL > --------------------------------------------------------------------- > > Key: FLINK-5654 > URL: https://issues.apache.org/jira/browse/FLINK-5654 > Project: Flink > Issue Type: Sub-task > Components: Table API & SQL > Reporter: Fabian Hueske > Assignee: radu > > The goal of this issue is to add support for OVER RANGE aggregations on > processing time streams to the SQL interface. > Queries similar to the following should be supported: > {code} > SELECT > a, > SUM(b) OVER (PARTITION BY c ORDER BY procTime() RANGE BETWEEN INTERVAL '1' > HOUR PRECEDING AND CURRENT ROW) AS sumB, > MIN(b) OVER (PARTITION BY c ORDER BY procTime() RANGE BETWEEN INTERVAL '1' > HOUR PRECEDING AND CURRENT ROW) AS minB > FROM myStream > {code} > The following restrictions should initially apply: > - All OVER clauses in the same SELECT clause must be exactly the same. > - The PARTITION BY clause is optional (no partitioning results in single > threaded execution). > - The ORDER BY clause may only have procTime() as parameter. procTime() is a > parameterless scalar function that just indicates processing time mode. > - UNBOUNDED PRECEDING is not supported (see FLINK-5657) > - FOLLOWING is not supported. > The restrictions will be resolved in follow up issues. If we find that some > of the restrictions are trivial to address, we can add the functionality in > this issue as well. > This issue includes: > - Design of the DataStream operator to compute OVER ROW aggregates > - Translation from Calcite's RelNode representation (LogicalProject with > RexOver expression). -- This message was sent by Atlassian JIRA (v6.3.15#6346)