[
https://issues.apache.org/jira/browse/SOLR-18155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18066436#comment-18066436
]
Pierre Salagnac commented on SOLR-18155:
----------------------------------------
After a second look, I don't know why this seed causes issues more often than
others. But I don't locally have a 100% repro with this seed, it's more around
80%. Also, I saw some other test classes sometimes failing with the same error.
Also, no idea why this is new (assuming it is!)
My understanding is {{CollectionsAPISolrJTest}} reproduces the issue often
because it creates many collections and does not delete them. The
{{MiniSolrCloudCluster}} is shutdown at end of test class, which shutdown nodes
one by one, but the collections are still live. This is not a problem by
itself, it just triggers leader election very late, after the node started the
shutdown sequence.
More detailed scenario:
# The test runs and creates many collections/shards...
# At end of test class, we invoke {{{}MiniSolrCloudCluster.shutdown(){}}},
which concurrently invokes {{stopJettySolrRunner()}} for all the nodes.
# Node _A_ initiates its shutdown sequence. In
{{{}ZkController.preClose(){}}}, we invoke {{zkCollectionTerms::close}} which
will close all the existing instances of {{{}ZkShardTerms{}}}.
# Node _B_ does the same (concurrently).
# Node _A_ invokes {{ZkController.tryCancelAllElections()}} which removes all
the ephemeral nodes for shard leader elections.
# Before node _B_ completes the same, one of the replicas on node _B_ is
elected as new leader (because of step 5). This causes a new instance of
{{ZkShardTerms}} to be created on node {_}B{_}. It won't be closed because step
4 is already done.
> CollectionsAPISolrJTest seed reliably leaks unclosed ZkShardTerms
> -----------------------------------------------------------------
>
> Key: SOLR-18155
> URL: https://issues.apache.org/jira/browse/SOLR-18155
> Project: Solr
> Issue Type: Task
> Reporter: Chris M. Hostetter
> Priority: Major
>
> The following seed reliably fails on {{main}} (as of
> {{684894af5f1591af1c49c2bb6fdfdd83a94a89b2}}) due to ObjectReleaseTracker's
> of ZkShardTerms...
> {noformat}
> ./gradlew :solr:solrj:test --tests
> "org.apache.solr.client.solrj.impl.CloudSolrClientCacheTest.testStaleStateRetryWaitsAfterSkipFailure"
> "-Ptests.jvmargs=-XX:TieredStopAtLevel=1 -XX:+UseParallelGC
> -XX:ActiveProcessorCount=1 -XX:ReservedCodeCacheSize=120m"
> -Ptests.seed=517946C27016E5DC -Ptests.useSecurityManager=true
> -Ptests.file.encoding=ISO-8859-1
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]