Xingcan Cui created FLINK-36914:
-----------------------------------
Summary: Sources with watermark alignment should wait for
watermark generation
Key: FLINK-36914
URL: https://issues.apache.org/jira/browse/FLINK-36914
Project: Flink
Issue Type: Improvement
Components: API / DataStream
Reporter: Xingcan Cui
When watermark alignment is enabled across multiple sources, event ingestion
should pause until each source generates at least one watermark. However,
currently, sources that fail to produce any watermark (for any reason) won't
participate in {{{}MaxAllowedWatermark{}}}'s computation. This happens whenever
a job restarts and not all the sources can start to read data at exactly the
same time.
WatermarkAlignment is usually enabled when users need to control the state size
of a job by avoiding caching too much data that won't be used right now. So
when a source can't produce watermarks, it should hold off event ingestion from
other sources instead of allowing them to read more events.
Regarding implementation, I'm unsure if the refactored Flink source API can
easily support "peek and generate watermarks". Some new mechanisms might need
to be introduced to support it.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)