gaurav-narula commented on code in PR #15136: URL: https://github.com/apache/kafka/pull/15136#discussion_r1549695928
########## core/src/main/scala/kafka/log/LogManager.scala: ########## @@ -1173,6 +1173,35 @@ class LogManager(logDirs: Seq[File], } } + def recoverAbandonedFutureLogs(brokerId: Int, newTopicsImage: TopicsImage): Unit = { + val abandonedFutureLogs = findAbandonedFutureLogs(brokerId, newTopicsImage) + abandonedFutureLogs.foreach { log => + val tp = log.topicPartition + + log.renameDir(UnifiedLog.logDirName(tp), shouldReinitialize = true) + log.removeLogMetrics() + futureLogs.remove(tp) + + currentLogs.put(tp, log) + log.newMetrics() + + info(s"Successfully renamed abandoned future log for $tp") + } + } + + private def findAbandonedFutureLogs(brokerId: Int, newTopicsImage: TopicsImage): Iterable[UnifiedLog] = { + futureLogs.values.flatMap { log => + val topicId = log.topicId.getOrElse { + throw new RuntimeException(s"The log dir $log does not have a topic ID, " + + "which is not allowed when running in KRaft mode.") + } + val partitionId = log.topicPartition.partition() + Option(newTopicsImage.getPartition(topicId, partitionId)) + .filter(pr => directoryId(log.parentDir).contains(pr.directory(brokerId))) + .map(_ => log) Review Comment: Thanks for the feedback. For (2), we've couple of options. We can either: (a) ignore the future replica (say in dir2) if the main replica exists in an online log dir (say dir1) or, (b) promote the future replica (in dir2) and remove the main replica (in dir1). (a) would result in ReplicaManager spawning a replicaAlterLogDir thread for the future replica and correcting the assignment to dir1, only for it to be changed back again to dir2 when the replicaAlterLogDir thread finishes its job. Refer https://github.com/apache/kafka/blob/acecd370cc3b25f12926e7a4664a2648f08c6c9a/core/src/main/scala/kafka/server/ReplicaManager.scala#L2734 and https://github.com/apache/kafka/blob/acecd370cc3b25f12926e7a4664a2648f08c6c9a/core/src/main/scala/kafka/server/ReplicaManager.scala#L2745 Since in these scenarios, the future replica is almost caught up with the main replica, I'm leaning towards option (b) to avoid more reassignments. Please let me know if you feel otherwise. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org