[ https://issues.apache.org/jira/browse/FLINK-6969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16066115#comment-16066115 ]
ASF GitHub Bot commented on FLINK-6969: --------------------------------------- Github user fhueske commented on the issue: https://github.com/apache/flink/pull/4183 Hi @wuchong, that's an interesting idea but I think it has the drawback that it might add latency. A timestamp extractor is only called with records and doesn't see watermarks. Since an operator might emit multiple records for the same timestamp, a timestamp extractor would always have to emit a watermark of last timestamp - 1 (we can be sure that the records are emitted in timestamp order) because it does not know which record is the last for a timestamp. So, we would add a latency of one window length (until the next window is processed). Custom operators are a low level interface but it shouldn't be too hard to implement one that holds watermarks back. > Add support for deferred computation for group window aggregates > ---------------------------------------------------------------- > > Key: FLINK-6969 > URL: https://issues.apache.org/jira/browse/FLINK-6969 > Project: Flink > Issue Type: New Feature > Components: Table API & SQL > Reporter: Fabian Hueske > Assignee: sunjincheng > > Deferred computation is a strategy to deal with late arriving data and avoid > updates of previous results. Instead of computing a result as soon as it is > possible (i.e., when a corresponding watermark was received), deferred > computation adds a configurable amount of slack time in which late data is > accepted before the result is compute. For example, instead of computing a > tumbling window of 1 hour at each full hour, we can add a deferred > computation interval of 15 minute to compute the result quarter past each > full hour. > This approach adds latency but can reduce the number of update esp. in use > cases where the user cannot influence the generation of watermarks. It is > also useful if the data is emitted to a system that cannot update result > (files or Kafka). The deferred computation interval should be configured via > the {{QueryConfig}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029)