[ 
https://issues.apache.org/jira/browse/FLINK-37399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Piotr Nowojski updated FLINK-37399:
-----------------------------------
    Description: 
Imagine the following scenario. Max allowed watermark alignment drift is set to 
30s and watermark alignment is synced/announced across subtasks every 1s.  This 
makes perfect sense during normal records processing.

But when processing messages from backlog, it creates a problem, defacto 
capping how quickly watermarks can be progressing. For example:
* at {{t0}} we announce that max allowed watermark is {{max_w0 = 
min(reported_watermark) + 30s}}
* next announcement will happen at {{t1 = t0 + 1s}}
* any source/partition exceeds {{max_w0}} until the next announcement happens 
at {{t1}}, it will be blocked by the watermark alignment

In other words, with the above configuration (30s allowed drift announced every 
~1s), we are de facto capping the backlog processing speed to:

*30 "event time" seconds per each 1 "real world" second*

The problem is the delay in how quickly watermark announcements are propagating 
between operators and the coordinator.

  was:
Imagine the following scenario. Max allowed watermark alignment drift is set to 
30s and watermark alignment is synced/announced across subtasks every 1s.  This 
makes perfect sense during normal records processing.

But when processing messages from backlog, it creates a problem, defacto 
capping how quickly watermarks can be progressing. For example:
* at {{t0}} we announce that max allowed watermark is {{max_w0 = 
min(reported_watermark) + 30s}}
* next announcement will happen at {{t1 = t0 + 1s }}
* any source/partition exceeds {{max_w0}} until the next announcement happens 
at {{t1}}, it will be blocked by the watermark alignment

In other words, with the above configuration (30s allowed drift announced every 
~1s), we are de facto capping the backlog processing speed to:

*30 "event time" seconds per each 1 "real world" second*

The problem is the delay in how quickly watermark announcements are propagating 
between operators and the coordinator.


> Watermark alignment can prevent backlogged jobs from using all available 
> resources
> ----------------------------------------------------------------------------------
>
>                 Key: FLINK-37399
>                 URL: https://issues.apache.org/jira/browse/FLINK-37399
>             Project: Flink
>          Issue Type: Improvement
>          Components: API / DataStream
>    Affects Versions: 2.0.0, 1.18.1, 1.19.2
>            Reporter: Piotr Nowojski
>            Assignee: Piotr Nowojski
>            Priority: Major
>
> Imagine the following scenario. Max allowed watermark alignment drift is set 
> to 30s and watermark alignment is synced/announced across subtasks every 1s.  
> This makes perfect sense during normal records processing.
> But when processing messages from backlog, it creates a problem, defacto 
> capping how quickly watermarks can be progressing. For example:
> * at {{t0}} we announce that max allowed watermark is {{max_w0 = 
> min(reported_watermark) + 30s}}
> * next announcement will happen at {{t1 = t0 + 1s}}
> * any source/partition exceeds {{max_w0}} until the next announcement happens 
> at {{t1}}, it will be blocked by the watermark alignment
> In other words, with the above configuration (30s allowed drift announced 
> every ~1s), we are de facto capping the backlog processing speed to:
> *30 "event time" seconds per each 1 "real world" second*
> The problem is the delay in how quickly watermark announcements are 
> propagating between operators and the coordinator.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to