[ https://issues.apache.org/jira/browse/FLINK-6969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16066186#comment-16066186 ]
ASF GitHub Bot commented on FLINK-6969: --------------------------------------- Github user sunjincheng121 commented on the issue: https://github.com/apache/flink/pull/4183 Hi @fhueske @wuchong Thanks for your reviewing and comments. Thanks! 1. For the param rename I am not sure whether it can be shared with `early fire` feature or not. I suggest using current name, and we can change it if we need when we dev the `early fire` feature. And feel free to rename it in this PR. if you and @wuchong Insist on to renaming, I am fine about that. :) 2. About timestamp and watermark: - Timestamp: I think we can emit records with the correct timestamps(late than watermark,but corresponds to window time ), I thinks the code `timestampedCollector.setAbsoluteTimestamp(window.maxTimestamp())` of `WindowOperator#emitWindowContents` can guarantee that logic. That's meant it late than watermark,but is correct. - Watermark: In current flink framework,GroupWindow and OverWindow related the `Watermark`. So If i understand you correctly, you worry about GroupWindow followed by a GroupWindow or a OverWindow. Let's see follows: - For followed by GroupWindow case: As we know `deferredComputationTime` is global configuration, i.e. In one job all GroupWindow will using the same TRIGGER(`DeferredComputationTrigger`), that only fires when current watermark not smaller than ` (window.maxTimestamp + queryConfig.getDeferredComputationTime)`. The records will be late emitted by the Level N window, So dose the Level N+1 window. The late always is `getDeferredComputationTime` time. i.e., This approach adds latency but can reduce the number of update esp.(that we wanted) - For followed by OverWindow case: I think this approach works well for row-time OverWindow, because Over Clause using timestamp value range the window. I think it works well If we emit correct timestamp for records.(And we did it by `timestampedCollector.setAbsoluteTimestamp(window.maxTimestamp())` of `WindowOperator#emitWindowContents`) - For select, filter...etc. I think also work well (adds latency). 3. About `Trigger+ AssignerWithPunctuatedWatermarks` (@wuchong comment above), Trigger is late `deferredComputationTime` and Watermark change it smaller (`deferredComputationTime`), If we have `window1(..).winodw2(...).window3()`. the delay is increasing. **So, the current approach only improved Level 1 group window, and end-to-end latency is `deferredComputationTime`.** **If we want let all the window only fired late `deferredComputationTime`, we should think about SLA mechanism. (Which we had discussed before).** The above description just from point of my view. So feel free to correct me if there are any incorrect analysis. Please let me know what you think? Best, SunJincheng > Add support for deferred computation for group window aggregates > ---------------------------------------------------------------- > > Key: FLINK-6969 > URL: https://issues.apache.org/jira/browse/FLINK-6969 > Project: Flink > Issue Type: New Feature > Components: Table API & SQL > Reporter: Fabian Hueske > Assignee: sunjincheng > > Deferred computation is a strategy to deal with late arriving data and avoid > updates of previous results. Instead of computing a result as soon as it is > possible (i.e., when a corresponding watermark was received), deferred > computation adds a configurable amount of slack time in which late data is > accepted before the result is compute. For example, instead of computing a > tumbling window of 1 hour at each full hour, we can add a deferred > computation interval of 15 minute to compute the result quarter past each > full hour. > This approach adds latency but can reduce the number of update esp. in use > cases where the user cannot influence the generation of watermarks. It is > also useful if the data is emitted to a system that cannot update result > (files or Kafka). The deferred computation interval should be configured via > the {{QueryConfig}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029)