[ 
https://issues.apache.org/jira/browse/SOLR-16122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17851833#comment-17851833
 ] 

David Smiley commented on SOLR-16122:
-------------------------------------

This test seems to fail due to thread leaks.

Happened yesterday in CI:
{noformat}
  2> INFO: All leaked threads terminated.
   >     com.carrotsearch.randomizedtesting.ThreadLeakError: 2 threads leaked 
from SUITE scope at org.apache.solr.cloud.TestLeaderElectionZkExpiry: 
   >        1) Thread[id=9557, 
name=zkConnectionManagerCallback-5960-thread-1-EventThread, state=WAITING, 
group=TGRP-TestLeaderElectionZkExpiry]
   >             at java.base@11.0.16.1/jdk.internal.misc.Unsafe.park(Native 
Method)
   >             at 
java.base@11.0.16.1/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194)
   >             at 
java.base@11.0.16.1/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2081)
   >             at 
java.base@11.0.16.1/java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:433)
   >             at 
app//org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:535)
   >        2) Thread[id=9549, 
name=zkConnectionManagerCallback-5960-thread-1-EventThread, state=WAITING, 
group=TGRP-TestLeaderElectionZkExpiry]
   >             at java.base@11.0.16.1/java.lang.Object.wait(Native Method)
   >             at java.base@11.0.16.1/java.lang.Object.wait(Object.java:328)
   >             at 
app//org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1583)
   >             at 
app//org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1555)
   >             at 
app//org.apache.zookeeper.ClientCnxn.close(ClientCnxn.java:1522)
   >             at 
app//org.apache.zookeeper.ZooKeeper.close(ZooKeeper.java:1227)
   >             at 
app//org.apache.solr.common.cloud.SolrZkClient.updateKeeper(SolrZkClient.java:863)
   >             at 
app//org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:190)
   >             at 
app//org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:59)
   >             at 
app//org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:179)
   >             at 
app//org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:564)
   >             at 
app//org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:539)
   >         at __randomizedtesting.SeedInfo.seed([B35AE6C0068D8659]:0)
{noformat}

And also for me on Crave recently (this time the OverseerShutdownThread):

{noformat}
2> SEVERE: 1 thread leaked from SUITE scope at 
org.apache.solr.cloud.TestLeaderElectionZkExpiry: 
  2>    1) Thread[id=349, name=OverseerExitThread, state=TIMED_WAITING, 
group=Overseer state updater.]
  2>         at java.base@11.0.23/java.lang.Thread.sleep(Native Method)
  2>         at 
app//org.apache.solr.common.cloud.ZkCmdExecutor.retryDelay(ZkCmdExecutor.java:101)
  2>         at 
app//org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:80)
  2>         at 
app//org.apache.solr.common.cloud.SolrZkClient.delete(SolrZkClient.java:345)
  2>         at 
app//org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:118)
  2>         at 
app//org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:310)
  2>         at 
app//org.apache.solr.cloud.LeaderElector.retryElection(LeaderElector.java:395)
  2>         at 
app//org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:133)
  2>         at 
app//org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:310)
  2>         at 
app//org.apache.solr.cloud.LeaderElector.retryElection(LeaderElector.java:395)
  2>         at 
app//org.apache.solr.cloud.ZkController.rejoinOverseerElection(ZkController.java:2364)
  2>         at 
app//org.apache.solr.cloud.Overseer$ClusterStateUpdater.checkIfIamStillLeader(Overseer.java:511)
  2>         at 
app//org.apache.solr.cloud.Overseer$ClusterStateUpdater$$Lambda$1667/0x000000010099b840.run(Unknown
 Source)
  2>         at java.base@11.0.23/java.lang.Thread.run(Thread.java:829)
{noformat}

This one above seems clear to me how it could happen since a new Thread is 
spawned with no wait 
[here|https://github.com/apache/solr/blob/70b6e4f6952cb7f9b3647865404487c68264668d/solr/core/src/java/org/apache/solr/cloud/Overseer.java#L417].



> TestLeaderElectionZkExpiry failing frequently
> ---------------------------------------------
>
>                 Key: SOLR-16122
>                 URL: https://issues.apache.org/jira/browse/SOLR-16122
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 9.0
>            Reporter: Jan Høydahl
>            Priority: Major
>
> Failing in 10% of runs - marking as {{@BadApple}} before the 9.0 release



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to