[ 
https://issues.apache.org/jira/browse/SOLR-16992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17768085#comment-17768085
 ] 

Alex Deparvu commented on SOLR-16992:
-------------------------------------

I agree with your analysis. makes perfect sense to me.

given the proliferation of this pattern across multiple classes, does it make 
sense to move this to a common util class and call it everywhere? not sure if 
possible but it would help nail it down once for all cases.

re. `I think at a minimum, CloudSolrStream.openStreams() should be changed such 
that...`

> It waits for the results of every Future, even if one of them throws an 
> exception
> it might make sense to instead use shutdownNow() + awaitTermination() if/when 
> any Future fails

it feels like there is some overlap between these 2 ideas. not sure how the 
impls would vary but I like waiting for everything to finish and discarding any 
useless results (prioritize the exception).

re. `Going above and beyond that, it's worth considering:`

> Change SolrClientCache so that any method that will add to solrClients throws 
> an IllegalStateException if isClosed

-0 it doesn't feel like this class is a correct place to tackle this.

> make all TupleStream instances throw an IllegalStateException from open() 
> and/or read() if isClosed

+1 I like this option the most. this would prevent any bad patterns from 
escaping and fail in a clearer way in the test?
I think it can be legal to call close twice (treat it as an idempotent 
operation), but not open after close that feels like a strong breach of api 
contract.
Another idea can we shortcut `StreamOpener` to bail early if stream is closed? 
so we avoid some noise in the logs.



> Non-reproducible StreamingTest failures -- suggests CloudSolrStream 
> concurency race condition
> ---------------------------------------------------------------------------------------------
>
>                 Key: SOLR-16992
>                 URL: https://issues.apache.org/jira/browse/SOLR-16992
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Chris M. Hostetter
>            Priority: Major
>         Attachments: 
> OUTPUT-org.apache.solr.client.solrj.io.stream.StreamingTest.txt, 
> thetaphi_solr_Solr-main-Linux_14679.log.txt
>
>
> Roughly 3% of all jenkins jobs that run {{StreamingTest}} wind up having 
> suite level failures.
> These failures have historically taken the form of 
> {{com.carrotsearch.randomizedtesting.ThreadLeakError}} and the leaked threads 
> all have names like
> {{"h2sc-718-thread-2"}} indicating that they come from the internal 
> {{ExecutorService}} of an {{{}Http2SolrClient{}}}.
> In my experience, the seeds from these failures have never reproduced - 
> suggesting that the problem is related to concurrency.
> SOLR-16983 restored the (correct) use of {{ObjectReleaseTracker}} which in 
> theory should help pinpoint where {{Http2SolrClient}} instances might not be 
> getting closed (by causing {{ObjectReleaseTracker}} to fail with stacktraces 
> of when/where any unclosed instances were created - ie: which test method)
> In practice, I have managed to force one failure from {{StreamingTest}} since 
> the SOLR-16983 changes (logs to be attached soon) - but it still didn't 
> indicate any leaked/unclosed {{Http2SolrClient}} instances. What it instead 
> indicated was a _single_ unclosed {{InputStream}} instance related to 
> {{Http2SolrClient}} connections (SOLR-16983 also added better tracking of 
> this) coming from {{StreamingTest.testExceptionStream}} - a test method that 
> opens _five_ very similar {{ExceptionStream}} instances, wrapping 
> {{CloudSolrStream}} instance, which expect to trigger server side errors.
> By it's very design, {{ExceptionStream}} catches & records any exceptions 
> from the stream it wraps, so even in the event of these "expected" server 
> side errors, {{ExceptionStream.close()}} should still be correctly getting 
> called (and propagating down to the {{CloudStream}} it wraps).
> I believe the underlying problem has to do with a concurrency race condition 
> between the call to {{CloudStream.close()}} and the {{ExecutorService}} used 
> internally by {{CloudSolrStream.openStreams()}} (details to follow)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to