Hongshun Wang created FLINK-32019:
-------------------------------------
Summary: EARLIEST offset strategy for partitions discoveried later
based on FLIP-288
Key: FLINK-32019
URL: https://issues.apache.org/jira/browse/FLINK-32019
Project: Flink
Issue Type: Sub-task
Components: Connectors / Kafka
Affects Versions: kafka-3.0.0
Reporter: Hongshun Wang
Fix For: kafka-4.0.0
As described in
[FLIP-288|https://cwiki.apache.org/confluence/display/FLINK/FLIP-288%3A+Enable+Dynamic+Partition+Discovery+by+Default+in+Kafka+Source],
the strategy used for new partitions is the same as the initial offset
strategy, which is not reasonable.
According to the semantics, if the startup strategy is latest, the consumed
data should include all data from the moment of startup, which also includes
all messages from new created partitions. However, the latest strategy
currently maybe used for new partitions, leading to the loss of some data
(thinking a new partition is created and might be discovered by Kafka source
several minutes later, and the message produced into the partition within the
gap might be dropped if we use for example "latest" as the initial offset
strategy).if the data from all new partitions is not read, it does not meet the
user's expectations.
Other ploblems see final Section of
[FLIP-288|https://cwiki.apache.org/confluence/display/FLINK/FLIP-288%3A+Enable+Dynamic+Partition+Discovery+by+Default+in+Kafka+Source]:
{{User specifies OffsetsInitializer for new partition}} .
Therefore, it’s better to provide an *EARLIEST* strategy for later discovered
partitions.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)