[ 
https://issues.apache.org/jira/browse/KAFKA-16234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17815403#comment-17815403
 ] 

Gaurav Narula commented on KAFKA-16234:
---------------------------------------

Perhaps a way to solve this would be to determine if a log is a stray replica 
at the time we load it and not after all logs have been loaded.

> Log directory failure re-creates partitions in another logdir automatically
> ---------------------------------------------------------------------------
>
>                 Key: KAFKA-16234
>                 URL: https://issues.apache.org/jira/browse/KAFKA-16234
>             Project: Kafka
>          Issue Type: Bug
>          Components: jbod
>    Affects Versions: 3.7.0
>            Reporter: Gaurav Narula
>            Assignee: Omnia Ibrahim
>            Priority: Major
>
> With [KAFKA-16157|https://github.com/apache/kafka/pull/15263] we made changes 
> in {{HostedPartition.Offline}} enum variant to embed {{Partition}} object. 
> Further, {{ReplicaManager::getOrCreatePartition}} tries to compare the old 
> and new topicIds to decide if it needs to create a new log.
> The getter for {{Partition::topicId}} relies on retrieving the topicId from 
> {{log}} field or {{{}logManager.currentLogs{}}}. The former is set to 
> {{None}} when a partition is marked offline and the key for the partition is 
> removed from the latter by {{{}LogManager::handleLogDirFailure{}}}. 
> Therefore, topicId for a partitioned marked offline always returns {{None}} 
> and new logs for all partitions in a failed log directory are always created 
> on another disk.
> The broker will fail to restart after the failed disk is repaired because 
> same partitions will occur in two different directories. The error does 
> however inform the operator to remove the partitions from the disk that 
> failed which should help with broker startup.
> We can avoid this with KAFKA-16212 but in the short-term, an immediate 
> solution can be to have {{Partition}} object accept {{Option[TopicId]}} in 
> it's constructor and have it fallback to {{log}} or {{logManager}} if it's 
> unset.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to