Piotr Nowojski created FLINK-37399:
--------------------------------------

             Summary: Watermark alignment can prevent backlogged jobs from 
using all available resources
                 Key: FLINK-37399
                 URL: https://issues.apache.org/jira/browse/FLINK-37399
             Project: Flink
          Issue Type: Improvement
          Components: API / DataStream
    Affects Versions: 1.19.2, 1.18.1, 2.0.0
            Reporter: Piotr Nowojski
            Assignee: Piotr Nowojski


Imagine the following scenario. Max allowed watermark alignment drift is set to 
30s and watermark alignment is synced/announced across subtasks every 1s.  This 
makes perfect sense during normal records processing.

But when processing messages from backlog, it creates a problem, defacto 
capping how quickly watermarks can be progressing. For example:
* at {{t0}} we announce that max allowed watermark is {{max_w0 = 
min(reported_watermark) + 30s}}
* next announcement will happen at {{t1 = t0 + 1s }}
* any source/partition exceeds {{max_w0}} until the next announcement happens 
at {{t1}}, it will be blocked by the watermark alignment

In other words, with the above configuration (30s allowed drift announced every 
~1s), we are de facto capping the backlog processing speed to:

*30 "event time" seconds per each 1 "real world" second*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to