hachikuji commented on a change in pull request #11681:
URL: https://github.com/apache/kafka/pull/11681#discussion_r808622838



##########
File path: core/src/test/scala/unit/kafka/admin/LeaderElectionCommandTest.scala
##########
@@ -55,9 +54,18 @@ final class LeaderElectionCommandTest(cluster: 
ClusterInstance) {
     
clusterConfig.serverProperties().put(KafkaConfig.OffsetsTopicReplicationFactorProp,
 "2")
   }
 
+  def waitForAdminClientHaveNumBrokers(numBrokers: Int): Admin = {
+    // Use a temporary adminClient to wait for all brokers up
+    // If we don't wait for the brokers up, we might have race condition that 
the metadata in adminClient only has broker2 or broker3 up,
+    // and after broker2/broker3 shutdown, no brokers are available to 
connect, which causes request timeout
+    TestUtils.waitForNumNodesUp(cluster.createAdminClient(), numBrokers)
+
+    cluster.createAdminClient()
+  }
+
   @ClusterTest
   def testAllTopicPartition(): Unit = {
-    val client = cluster.createAdminClient()
+    val client = waitForAdminClientHaveNumBrokers(3)

Review comment:
       Ah, I didn't see @showuon's comment below about metadata cache 
consistency. I guess the issue is that unfencing does not necessarily imply 
that the broker is caught up to the end of the metadata log. I know previously 
we were considering having the broker consume its own registration before it 
declared itself ready to unfence, but it looks like we haven't implemented 
that. Perhaps a short-term fix is to change 
`KafkaClusterTestKit.waitForReadyBrokers` to check metadata caches directly?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to