[
https://issues.apache.org/jira/browse/SOLR-15660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17533066#comment-17533066
]
Michael Gibney commented on SOLR-15660:
---------------------------------------
There's several issues being conflated here I think. The stack trace that Jan
and Gus refer to above _is_ able to be addressed I believe; I've taken an
initial stab at doing so in
[apache/solr#842|https://github.com/apache/solr/pull/842]. But I think it's
impossible to entirely get around ThreadLeakLinger, and probably a mistake to
try to do so, due to the [number of errors we're seeing
now|https://lists.apache.org/[email protected]:dfr=2022-4-1] that are
really _completely_ spurious.
There's some high-level discussion of ThreadLeakLinger in MAHOUT-1345 that
makes it clear even if we fixed all the actual thread leaks, we'd still be
getting "leaked" threads detected spuriously [between
awaitTermination.signalAll and actual thread
death|https://github.com/openjdk/jdk17u/blob/20f3576cd1bbe516360b0d9f7deaacdad94df4d7/src/java.base/share/classes/java/util/concurrent/ThreadPoolExecutor.java#L728-L733].
Universal thread leak linger of 10s is probably overkill, granted; and may (?)
have masked a bunch of actual issues. But I'd argue that universal thread leak
linger of perhaps 1s could be viewed as a small price to pay for avoiding tons
of spurious failures and avoiding the need for every developer to be intimately
familiar with the issues surrounding (perfectly normal) delayed thread death in
Executors.
Notably though, even the TestLeaderElectionZkExpiry test failure, though "real"
in a sense, may not have been worth the trouble to address. Yes it was
surprising to me that ZooKeeper.close() doesn't block until the connection
threads die. But although we can tighten that up, I'd argue it represents a
transient/aesthetic resource leak, not a practically significant one. And the
unavoidable Executor "resource leak" is a game changer from my perspective. I
mean, 10ms might _indeed_ be enough time to avoid these spurious errors.
Evidently _0s_ is usually enough time to avoid such errors :)
> Remove universal 10 second test thread leak linger.
> ---------------------------------------------------
>
> Key: SOLR-15660
> URL: https://issues.apache.org/jira/browse/SOLR-15660
> Project: Solr
> Issue Type: Test
> Components: Tests
> Reporter: Mark Robert Miller
> Assignee: Mark Robert Miller
> Priority: Minor
> Fix For: 9.0
>
> Attachments: screenshot-1.png
>
> Time Spent: 40m
> Remaining Estimate: 0h
>
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]