showuon commented on code in PR #15133:
URL: https://github.com/apache/kafka/pull/15133#discussion_r1463172726


##########
core/src/test/scala/unit/kafka/utils/TestUtils.scala:
##########
@@ -1396,11 +1396,13 @@ object TestUtils extends Logging {
   // Note: Call this method in the test itself, rather than the @AfterEach 
method.
   // Because of the assert, if assertNoNonDaemonThreads fails, nothing after 
would be executed.
   def assertNoNonDaemonThreads(threadNamePrefix: String): Unit = {
-    val nonDaemonThreads = Thread.getAllStackTraces.keySet.asScala.filter { t 
=>
-      !t.isDaemon && t.isAlive && t.getName.startsWith(threadNamePrefix)
-    }
-    val threadCount = nonDaemonThreads.size
-    assertEquals(0, threadCount, s"Found unexpected $threadCount NonDaemon 
threads=${nonDaemonThreads.map(t => t.getName).mkString(", ")}")
+    var nonDemonThreads: mutable.Set[Thread] = mutable.Set.empty[Thread]
+    waitUntilTrue(() => {
+      nonDemonThreads = Thread.getAllStackTraces.keySet.asScala.filter { t =>
+        !t.isDaemon && t.isAlive && t.getName.startsWith(threadNamePrefix)
+      }
+      0 == nonDemonThreads.size
+    }, s"Found unexpected ${nonDemonThreads.size} NonDaemon 
threads=${nonDemonThreads.map(t => t.getName).mkString(", ")}", 1000)

Review Comment:
   cc @divijvaidya , I found sometimes the 
[CI](https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-15133/9/testReport/junit/kafka.server/ReplicaManagerTest/Build___JDK_11_and_Scala_2_13___testSuccessfulBuildRemoteLogAuxStateMetrics__/)
 is too sensitive to the non demean threads check. There are some shutdown are 
in async way. So you can check the failed results 
[here](https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-15133/9/): 
Basically, if there are some resource not closed, all the following tests 
should also fail (I verified in my local env). But in the CI results, it only 
fail 2 of replicaManagertest, and only in jdk11. So I'm going to verify it 
using `waitUntilTrue` to give it some chance to wait for the threads shutdown.
   
   I also set the wait time as 1 second because if there are really resources 
leaked, the total wait time will be the product of `waitTime` and the number of 
all the following failed tests. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to