[ 
https://issues.apache.org/jira/browse/FLINK-6969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16066186#comment-16066186
 ] 

ASF GitHub Bot commented on FLINK-6969:
---------------------------------------

Github user sunjincheng121 commented on the issue:

    https://github.com/apache/flink/pull/4183
  
    Hi @fhueske @wuchong Thanks for your reviewing and comments. Thanks!
    
    1. For the param rename I am not sure whether it can be shared with `early 
fire` feature  or not. I suggest using current name, and we can change it if we 
need when we dev the `early fire` feature. And feel free to rename it in this 
PR. if you and @wuchong Insist on to renaming, I am fine about that. :)
    
    2. About timestamp and watermark:
    - Timestamp: 
    I think we can emit records with the correct timestamps(late than 
watermark,but corresponds to window time ), I thinks the code 
`timestampedCollector.setAbsoluteTimestamp(window.maxTimestamp())` of  
`WindowOperator#emitWindowContents` can guarantee that logic. That's meant it 
late than watermark,but is correct.
    
    - Watermark:
      In current flink framework,GroupWindow and OverWindow related the 
`Watermark`. So If i understand you correctly, you worry about GroupWindow 
followed by a GroupWindow or  a OverWindow. Let's see follows:
         - For followed by GroupWindow case:
          As we know  `deferredComputationTime` is global configuration, i.e. 
In one job all GroupWindow will using the same 
TRIGGER(`DeferredComputationTrigger`), that only fires when current watermark 
not smaller than ` (window.maxTimestamp + 
queryConfig.getDeferredComputationTime)`. The records will be late emitted by 
the Level N window, So dose the Level N+1 window. The late always is 
`getDeferredComputationTime` time. i.e., This approach adds latency but can 
reduce the number of update esp.(that we wanted)
    
         - For followed by OverWindow case:
           I think this approach works well for row-time OverWindow, because 
Over Clause using timestamp value range the window. I think it works well If we 
emit correct timestamp for records.(And we did it by 
`timestampedCollector.setAbsoluteTimestamp(window.maxTimestamp())` of  
`WindowOperator#emitWindowContents`) 
    
         - For select, filter...etc. I think also work well (adds latency).
    
    3.  About `Trigger+ AssignerWithPunctuatedWatermarks` (@wuchong comment 
above), Trigger is late `deferredComputationTime` and Watermark change it 
smaller (`deferredComputationTime`), If we have 
`window1(..).winodw2(...).window3()`. the delay is increasing.
    
    **So, the current approach only improved Level 1 group window, and 
end-to-end latency is `deferredComputationTime`.**
    
    **If we want let all the window only fired late `deferredComputationTime`, 
we should think about SLA mechanism. (Which we had discussed before).**
    
    The above description just from point of my view. So feel free to correct 
me if there are any incorrect analysis.
    
    Please let me know what you think? 
    
    Best,
    SunJincheng
    
      



> Add support for deferred computation for group window aggregates
> ----------------------------------------------------------------
>
>                 Key: FLINK-6969
>                 URL: https://issues.apache.org/jira/browse/FLINK-6969
>             Project: Flink
>          Issue Type: New Feature
>          Components: Table API & SQL
>            Reporter: Fabian Hueske
>            Assignee: sunjincheng
>
> Deferred computation is a strategy to deal with late arriving data and avoid 
> updates of previous results. Instead of computing a result as soon as it is 
> possible (i.e., when a corresponding watermark was received), deferred 
> computation adds a configurable amount of slack time in which late data is 
> accepted before the result is compute. For example, instead of computing a 
> tumbling window of 1 hour at each full hour, we can add a deferred 
> computation interval of 15 minute to compute the result quarter past each 
> full hour.
> This approach adds latency but can reduce the number of update esp. in use 
> cases where the user cannot influence the generation of watermarks. It is 
> also useful if the data is emitted to a system that cannot update result 
> (files or Kafka). The deferred computation interval should be configured via 
> the {{QueryConfig}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to