gaurav-narula commented on code in PR #15136:
URL: https://github.com/apache/kafka/pull/15136#discussion_r1549695928


##########
core/src/main/scala/kafka/log/LogManager.scala:
##########
@@ -1173,6 +1173,35 @@ class LogManager(logDirs: Seq[File],
     }
   }
 
+  def recoverAbandonedFutureLogs(brokerId: Int, newTopicsImage: TopicsImage): 
Unit = {
+    val abandonedFutureLogs = findAbandonedFutureLogs(brokerId, newTopicsImage)
+    abandonedFutureLogs.foreach { log =>
+      val tp = log.topicPartition
+
+      log.renameDir(UnifiedLog.logDirName(tp), shouldReinitialize = true)
+      log.removeLogMetrics()
+      futureLogs.remove(tp)
+
+      currentLogs.put(tp, log)
+      log.newMetrics()
+
+      info(s"Successfully renamed abandoned future log for $tp")
+    }
+  }
+
+  private def findAbandonedFutureLogs(brokerId: Int, newTopicsImage: 
TopicsImage): Iterable[UnifiedLog] = {
+    futureLogs.values.flatMap { log =>
+      val topicId = log.topicId.getOrElse {
+        throw new RuntimeException(s"The log dir $log does not have a topic 
ID, " +
+          "which is not allowed when running in KRaft mode.")
+      }
+      val partitionId = log.topicPartition.partition()
+      Option(newTopicsImage.getPartition(topicId, partitionId))
+        .filter(pr => 
directoryId(log.parentDir).contains(pr.directory(brokerId)))
+        .map(_ => log)

Review Comment:
   Thanks for the feedback.
   
   For (2), we've couple of options. We can either:
   
   (a) ignore the future replica (say in dir2) if the main replica exists in an 
online log dir (say dir1) or,
   (b) promote the future replica (in dir2)  and remove the main replica (in 
dir1).
   
   (a) would result in ReplicaManager spawning a replicaAlterLogDir thread for 
the future replica and correcting the assignment to dir1, only for it to be 
changed back again to dir2 when the replicaAlterLogDir thread finishes its job. 
Refer 
https://github.com/apache/kafka/blob/acecd370cc3b25f12926e7a4664a2648f08c6c9a/core/src/main/scala/kafka/server/ReplicaManager.scala#L2734
 and 
https://github.com/apache/kafka/blob/acecd370cc3b25f12926e7a4664a2648f08c6c9a/core/src/main/scala/kafka/server/ReplicaManager.scala#L2745
   
   Since in these scenarios, the future replica is almost caught up with the 
main replica, I'm leaning towards option (b) to avoid more reassignments. 
Please let me know if you feel otherwise.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to