[jira] [Commented] (KAFKA-20035) Prevent data loss during partition expansion by enforcing "earliest" offset reset for dynamically added partitions

fujian (Jira) Wed, 25 Feb 2026 21:42:50 -0800


    [ 
https://issues.apache.org/jira/browse/KAFKA-20035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18061274#comment-18061274
 ]


fujian commented on KAFKA-20035:
--------------------------------

[~dajac] Thanks for your comments. I think the reset-by-duration proposed in 
the KIP is one possible mitigation, thus it makes it easier to reprocess some 
messages. So, it addresses this issue by introducing another one.

If we want to use a workaround for this operational issue instead of fixing it 
at the code level, we could follow these steps:
(1) set auto.offset.reset=earliest
(2) perform the partition expansion
(3) after the operation is complete, set auto.offset.reset=latest back

In practice, If I understand right. regardless of which strategy is used, some 
messages may still be lost due to retention or truncation. The main difference 
is the amount of data lost. Using earliest may result in fewer lost messages.

So the key difference is the cause of the message loss. If the loss is caused 
by retention, it is usually due to slow consumers or unreasonable retention 
settings. This can be fixed by adjusting the configuration or resolving the 
slow consumer issue, and it can reasonably be explained to the user: it is a 
kafka user's issue (why you don't set a big rentension. why you have so slow 
consumer). However, if the loss is caused by partition expansion, it is much 
harder to prevent and also harder to clearly explain to users. 

So if the issue can be fixed by very simple code change. It will be a nice 
thing. 

Thanks again.

> Prevent data loss during partition expansion by enforcing "earliest" offset 
> reset for dynamically added partitions
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-20035
>                 URL: https://issues.apache.org/jira/browse/KAFKA-20035
>             Project: Kafka
>          Issue Type: Bug
>          Components: clients, consumer, core, group-coordinator
>            Reporter: Chia-Ping Tsai
>            Assignee: Ken Huang
>            Priority: Critical
>              Labels: kip
>
> Currently, when a consumer group is configured with {{{}auto.offset.reset = 
> latest{}}}, dynamically adding new partitions to a subscribed topic can lead 
> to data loss due to a race condition.
> The scenario is as follows:
>  # A group subscribes to a topic with {{{}auto.offset.reset = latest{}}}.
>  # The topic is expanded (e.g., from 3 to 4 partitions).
>  # Producers immediately start writing data to the new partition (Partition 
> 3).
>  # The Group Coordinator detects the change and assigns Partition 3 to a 
> member.
>  # The member initializes the partition. Since there is no committed offset, 
> it applies the
>  # *Result: Any messages written to Partition 3 between step 3 and step 5 are 
> skipped and lost.*
> From a user's perspective, {{latest}} should mean "start consuming from the 
> point of subscription," not "skip data from newly created infrastructure."
> KIP-1282: 
> [https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=406619800] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KAFKA-20035) Prevent data loss during partition expansion by enforcing "earliest" offset reset for dynamically added partitions

Reply via email to