[
https://issues.apache.org/jira/browse/KAFKA-15650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17801078#comment-17801078
]
Colin McCabe commented on KAFKA-15650:
--------------------------------------
Based on our follow-up discussions, this is not an issue because partitions
initially are in state UNASSIGNED, and only later get a directory. (Unless
there is only a single directory -- then the controller assigns.)
> Data-loss on leader shutdown right after partition creation?
> ------------------------------------------------------------
>
> Key: KAFKA-15650
> URL: https://issues.apache.org/jira/browse/KAFKA-15650
> Project: Kafka
> Issue Type: Sub-task
> Reporter: Igor Soarez
> Priority: Major
>
> As per KIP-858, when a replica is created, the broker selects a log directory
> to host the replica and queues the propagation of the directory assignment to
> the controller. The replica becomes immediately active, it isn't blocked
> until the controller confirms the metadata change. If the replica is the
> leader replica it can immediately start accepting writes.
> Consider the following scenario:
> # A partition is created in some selected log directory, and some produce
> traffic is accepted
> # Before the broker is able to notify the controller of the directory
> assignment, the broker shuts down
> # Upon coming back online, the broker has an offline directory, the same
> directory which was chosen to host the replica
> # The broker assumes leadership for the replica, but cannot find it in any
> available directory and has no way of knowing it was already created because
> the directory assignment is still missing
> # The replica is created and the previously produced records are lost
> Step 4. may seem unlikely due to ISR membership gating leadership, but even
> assuming acks=all and replicas>1, if all other replicas are also offline the
> broker may still gain leadership. Perhaps KIP-966 is relevant here.
> We may need to delay new replica activation until the assignment is
> propagated successfully.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)