hachikuji commented on a change in pull request #11681: URL: https://github.com/apache/kafka/pull/11681#discussion_r808622838
########## File path: core/src/test/scala/unit/kafka/admin/LeaderElectionCommandTest.scala ########## @@ -55,9 +54,18 @@ final class LeaderElectionCommandTest(cluster: ClusterInstance) { clusterConfig.serverProperties().put(KafkaConfig.OffsetsTopicReplicationFactorProp, "2") } + def waitForAdminClientHaveNumBrokers(numBrokers: Int): Admin = { + // Use a temporary adminClient to wait for all brokers up + // If we don't wait for the brokers up, we might have race condition that the metadata in adminClient only has broker2 or broker3 up, + // and after broker2/broker3 shutdown, no brokers are available to connect, which causes request timeout + TestUtils.waitForNumNodesUp(cluster.createAdminClient(), numBrokers) + + cluster.createAdminClient() + } + @ClusterTest def testAllTopicPartition(): Unit = { - val client = cluster.createAdminClient() + val client = waitForAdminClientHaveNumBrokers(3) Review comment: Ah, I didn't see @showuon's comment below about metadata cache consistency. I guess the issue is that unfencing does not necessarily imply that the broker is caught up to the end of the metadata log. I know previously we were considering having the broker consume its own registration before it declared itself ready to unfence, but it looks like we haven't implemented that. Perhaps a short-term fix is to change `KafkaClusterTestKit.waitForReadyBrokers` to check metadata caches directly? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org