[ https://issues.apache.org/jira/browse/SOLR-16122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17851833#comment-17851833 ]
David Smiley commented on SOLR-16122: ------------------------------------- This test seems to fail due to thread leaks. Happened yesterday in CI: {noformat} 2> INFO: All leaked threads terminated. > com.carrotsearch.randomizedtesting.ThreadLeakError: 2 threads leaked from SUITE scope at org.apache.solr.cloud.TestLeaderElectionZkExpiry: > 1) Thread[id=9557, name=zkConnectionManagerCallback-5960-thread-1-EventThread, state=WAITING, group=TGRP-TestLeaderElectionZkExpiry] > at java.base@11.0.16.1/jdk.internal.misc.Unsafe.park(Native Method) > at java.base@11.0.16.1/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194) > at java.base@11.0.16.1/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2081) > at java.base@11.0.16.1/java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:433) > at app//org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:535) > 2) Thread[id=9549, name=zkConnectionManagerCallback-5960-thread-1-EventThread, state=WAITING, group=TGRP-TestLeaderElectionZkExpiry] > at java.base@11.0.16.1/java.lang.Object.wait(Native Method) > at java.base@11.0.16.1/java.lang.Object.wait(Object.java:328) > at app//org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1583) > at app//org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1555) > at app//org.apache.zookeeper.ClientCnxn.close(ClientCnxn.java:1522) > at app//org.apache.zookeeper.ZooKeeper.close(ZooKeeper.java:1227) > at app//org.apache.solr.common.cloud.SolrZkClient.updateKeeper(SolrZkClient.java:863) > at app//org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:190) > at app//org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:59) > at app//org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:179) > at app//org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:564) > at app//org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:539) > at __randomizedtesting.SeedInfo.seed([B35AE6C0068D8659]:0) {noformat} And also for me on Crave recently (this time the OverseerShutdownThread): {noformat} 2> SEVERE: 1 thread leaked from SUITE scope at org.apache.solr.cloud.TestLeaderElectionZkExpiry: 2> 1) Thread[id=349, name=OverseerExitThread, state=TIMED_WAITING, group=Overseer state updater.] 2> at java.base@11.0.23/java.lang.Thread.sleep(Native Method) 2> at app//org.apache.solr.common.cloud.ZkCmdExecutor.retryDelay(ZkCmdExecutor.java:101) 2> at app//org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:80) 2> at app//org.apache.solr.common.cloud.SolrZkClient.delete(SolrZkClient.java:345) 2> at app//org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:118) 2> at app//org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:310) 2> at app//org.apache.solr.cloud.LeaderElector.retryElection(LeaderElector.java:395) 2> at app//org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:133) 2> at app//org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:310) 2> at app//org.apache.solr.cloud.LeaderElector.retryElection(LeaderElector.java:395) 2> at app//org.apache.solr.cloud.ZkController.rejoinOverseerElection(ZkController.java:2364) 2> at app//org.apache.solr.cloud.Overseer$ClusterStateUpdater.checkIfIamStillLeader(Overseer.java:511) 2> at app//org.apache.solr.cloud.Overseer$ClusterStateUpdater$$Lambda$1667/0x000000010099b840.run(Unknown Source) 2> at java.base@11.0.23/java.lang.Thread.run(Thread.java:829) {noformat} This one above seems clear to me how it could happen since a new Thread is spawned with no wait [here|https://github.com/apache/solr/blob/70b6e4f6952cb7f9b3647865404487c68264668d/solr/core/src/java/org/apache/solr/cloud/Overseer.java#L417]. > TestLeaderElectionZkExpiry failing frequently > --------------------------------------------- > > Key: SOLR-16122 > URL: https://issues.apache.org/jira/browse/SOLR-16122 > Project: Solr > Issue Type: Bug > Affects Versions: 9.0 > Reporter: Jan Høydahl > Priority: Major > > Failing in 10% of runs - marking as {{@BadApple}} before the 9.0 release -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org