[ 
https://issues.apache.org/jira/browse/FLINK-6969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16066115#comment-16066115
 ] 

ASF GitHub Bot commented on FLINK-6969:
---------------------------------------

Github user fhueske commented on the issue:

    https://github.com/apache/flink/pull/4183
  
    Hi @wuchong, that's an interesting idea but I think it has the drawback 
that it might add latency. 
    
    A timestamp extractor is only called with records and doesn't see 
watermarks. Since an operator might emit multiple records for the same 
timestamp, a timestamp extractor would always have to emit a watermark of last 
timestamp - 1 (we can be sure that the records are emitted in timestamp order) 
because it does not know which record is the last for a timestamp. So, we would 
add a latency of one window length (until the next window is processed).
    
    Custom operators are a low level interface but it shouldn't be too hard to 
implement one that holds watermarks back.


> Add support for deferred computation for group window aggregates
> ----------------------------------------------------------------
>
>                 Key: FLINK-6969
>                 URL: https://issues.apache.org/jira/browse/FLINK-6969
>             Project: Flink
>          Issue Type: New Feature
>          Components: Table API & SQL
>            Reporter: Fabian Hueske
>            Assignee: sunjincheng
>
> Deferred computation is a strategy to deal with late arriving data and avoid 
> updates of previous results. Instead of computing a result as soon as it is 
> possible (i.e., when a corresponding watermark was received), deferred 
> computation adds a configurable amount of slack time in which late data is 
> accepted before the result is compute. For example, instead of computing a 
> tumbling window of 1 hour at each full hour, we can add a deferred 
> computation interval of 15 minute to compute the result quarter past each 
> full hour.
> This approach adds latency but can reduce the number of update esp. in use 
> cases where the user cannot influence the generation of watermarks. It is 
> also useful if the data is emitted to a system that cannot update result 
> (files or Kafka). The deferred computation interval should be configured via 
> the {{QueryConfig}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to