[
https://issues.apache.org/jira/browse/SOLR-11911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16404696#comment-16404696
]
Andrzej Bialecki commented on SOLR-11911:
------------------------------------------
This test doesn't use MiniSolrCloudCluster, it uses the simulator. However,
you're right that the underlying issue was the Callable-s that didn't want to
shut down when the executors were shutdown, specifically the loop in
{{ComputePlanAction}}. In regular (non-simulated) tests that use small clusters
and small collections this wasn't visible, but here with a 100 nodes and
thousands of replicas the time it takes to compute all operations becomes
significant - larger than the thread linger time.
Regarding the shutdown of the cluster - whether simulated or not - it should
interrupt the processing of autoscaling events because they won't be acted upon
anyway.
bq. so even if one of these executor tasks was effectively blocked forever,
shouldn't that be causing the test to timeout, not report a leaked thread?
The executor that processes trigger events (which manages the threads that were
leaking here) is closed using {{shutdownNow}} for the reason above. This
interrupts the threads, but the actual code didn't check for the interrupted
status and continued looping.
> TestLargeCluster.testSearchRate() failure
> -----------------------------------------
>
> Key: SOLR-11911
> URL: https://issues.apache.org/jira/browse/SOLR-11911
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Steve Rowe
> Assignee: Andrzej Bialecki
> Priority: Major
>
> My Jenkins found a branch_7x seed that reproduced 4/5 times for me:
> {noformat}
> Checking out Revision af9706cb89335a5aa04f9bcae0c2558a61803b50
> (refs/remotes/origin/branch_7x)
> [...]
> [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestLargeCluster
> -Dtests.method=testSearchRate -Dtests.seed=2D7724685882A83D -Dtests.slow=true
> -Dtests.locale=be-BY -Dtests.timezone=Africa/Ouagadougou -Dtests.asserts=true
> -Dtests.file.encoding=UTF-8
> [junit4] FAILURE 1.24s J0 | TestLargeCluster.testSearchRate <<<
> [junit4] > Throwable #1: java.lang.AssertionError: The trigger did not
> fire at all
> [junit4] > at
> __randomizedtesting.SeedInfo.seed([2D7724685882A83D:703F3AE197440E72]:0)
> [junit4] > at
> org.apache.solr.cloud.autoscaling.sim.TestLargeCluster.testSearchRate(TestLargeCluster.java:547)
> [junit4] > at java.lang.Thread.run(Thread.java:748)
> [...]
> [junit4] 2> NOTE: test params are: codec=CheapBastard,
> sim=RandomSimilarity(queryNorm=true): {}, locale=be-BY,
> timezone=Africa/Ouagadougou
> [junit4] 2> NOTE: Linux 4.1.0-custom2-amd64 amd64/Oracle Corporation
> 1.8.0_151 (64-bit)/cpus=16,threads=1,free=388243840,total=502267904
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]