[jira] [Commented] (FLINK-6969) Add support for deferred computation for group window aggregates

ASF GitHub Bot (JIRA) Mon, 10 Jul 2017 07:43:28 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-6969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16080429#comment-16080429
 ]


ASF GitHub Bot commented on FLINK-6969:
---------------------------------------

Github user fhueske commented on the issue:

    https://github.com/apache/flink/pull/4183
  
    I think support for watermark configuration in a DDL statement is a nice 
feature but not a replacement for this parameter.
    
    I am thinking of a use case where streaming tables are registered in a 
catalog. Each table has some kind of watermark strategy which might be 
conservative (little late data, higher latency) or eager (more late data, less 
latency). Users want to query these streaming tables but might need more 
conservative or more eager watermarks. Of course they don't want to register 
the tables again (with all the additional information required) to just adapt 
the watermarks for their use case. With this parameter, they can simply adjust 
the watermarks of the already registered tables.
    
    While writing this, I noticed that it would make sense to optionally add 
the name of table on which the watermark manipulation should be applied. Some 
table might already have good watermarks while others need to be adjusted.
    So we would have two configuration methods:
    
    ```
    // adjusts the watermarks of all input tables
     def withLateDataTimeOffset(lateDataTimeOffset: Time): StreamQueryConfig
    
    // adjusts the watermarks of a specific input table
     def withLateDataTimeOffset(table: String, lateDataTimeOffset: Time): 
StreamQueryConfig
    ```
    
    What do you think @sunjincheng121 and @wuchong?


> Add support for deferred computation for group window aggregates
> ----------------------------------------------------------------
>
>                 Key: FLINK-6969
>                 URL: https://issues.apache.org/jira/browse/FLINK-6969
>             Project: Flink
>          Issue Type: New Feature
>          Components: Table API & SQL
>            Reporter: Fabian Hueske
>            Assignee: sunjincheng
>         Attachments: screenshot-1.png
>
>
> Deferred computation is a strategy to deal with late arriving data and avoid 
> updates of previous results. Instead of computing a result as soon as it is 
> possible (i.e., when a corresponding watermark was received), deferred 
> computation adds a configurable amount of slack time in which late data is 
> accepted before the result is compute. For example, instead of computing a 
> tumbling window of 1 hour at each full hour, we can add a deferred 
> computation interval of 15 minute to compute the result quarter past each 
> full hour.
> This approach adds latency but can reduce the number of update esp. in use 
> cases where the user cannot influence the generation of watermarks. It is 
> also useful if the data is emitted to a system that cannot update result 
> (files or Kafka). The deferred computation interval should be configured via 
> the {{QueryConfig}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (FLINK-6969) Add support for deferred computation for group window aggregates

Reply via email to