[ https://issues.apache.org/jira/browse/SOLR-17416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17877915#comment-17877915 ]
Chris M. Hostetter commented on SOLR-17416: ------------------------------------------- I've seen this in the wild and after digging into it a bit can elaborate on the circumstances... * {{ExportBuffers.run(Callable)}} is designed around the use of two threads: ** a background thread that acts as a "filler" to read data from disk and populate a {{fillBuffer}} ** the caller thread, which serially invokes the "writer" logic passed in as the {{Callable}} argument, to serialize an {{outputBuffer}} * These two threads use {{exchangeBuffers()}} to swap with each other, which under the covers uses a {{CylicBarrier}} ** The {{CyclicBarrier.await(time)}} calls use a hard coded timeout of 600 seconds *** This seems to have been chosen as an arbitrary "big value" to ensure that the neither thread sat around waiting forever (consuming ram) if the other thread died. The problem however is that this 600 second timeout may not be enough to account for really slow downstream consumption of the data. With really large collections, and really complicated streaming expressions, this can happen even when well behaved clients that are actively trying to consume data. One particular example I've seen is a stream where a {{leftOuterJoin}} is wrapped around two very large collections, spread over a large number of shards.... {noformat} leftOuterJoin(search(biggest_left_collection,q="*:*",sort="join_field asc",fl="...",qt="/export"), search(big_right_collection,q="*:*",sort="join_field asc",fl="...",qt="/export"), on="join_field") {noformat} ...After an initial burst of fetching the an initial {{batchSize}} of Tuples from both streams, and slurping up all the results from both streams that have the same {{join_field}} value as the head of the left Tuple, each {{read()}} call from the downstream client causes Tuples from the merged set to be consumed – but neither of the upstream "left" or "right" streams reads any new data for a bit. And even once additional values are read from the "left" stream, the "right" stream may sit stagnant for even longer as the node running the join code keeps consuming (and passing downstream) Tuples from the left stream until it encounters the same {{field_field}} value from the head of right stream and _then_ starts fetching more data from the individual replicas of that right stream collection. Which can take a while – and in the meantime the "filler" thread on one (or more) of the replicas of {{big_right_collection}} may have encountered a {{TimeoutException}} from the {{barrier.await(time)}} In _theory_ the "filler" code in {{ExportBuffers}} reports the {{TimeoutException}} in this situation by by passing the exception to the following method before the filler thread ends... {code:java} public void error(Throwable t) { error = t; // break the lock on the other thread too barrier.reset(); } {code} ...expecting the "writer" code (from the caller thread) to check the value of {{getError()}} and report it. But there are two main problems with this approach to passing the exception along: 1. The common case "writer" logic in {{ExportWriter}} (which calls {{ExportBuffers.run()}} *_NEVER_* checks {{ExportBuffers.getError()}} – it is only ever checked by (the special case logic in) {{ExportWriterStream}} and even there it's only check in the event of a {{BrokenBarrierException}} 2. The logic inside {{ExportBuffers.run()}} itself only logs (and ignores) any type of Throwable that propagates up to it. A very tightly related third problem is the call to {{barrier.reset()}} : * AFAICT The usage of this method call seems to be a complete mistake / misunderstanding ** It's comment & usage here in the {{error()}} method (ie: after one of the two parties gets either a {{TimeoutException}} or {{BrokenBarrierException}} ) seems to be with the intent of ensuring that _if_ the other thread is currently {{await(time)}} -ing on the barrier, that thread should get a {{BrokenBarrierException}} – but that's not what this method is designed for. ** The way we are using the {{CyclicBarrier}} in this code, the situation described in the comment should never even be possible. * In general, anytime any call to {{CyclicBarrier.await()}} throws an exception the barrier is left in a "broken" state, and any other calls to {{CyclicBarrier.await()}} ({_}either currently waiting, or in the future{_}) will get a {{BrokenBarrierException}} ** The call to {{barrier.reset()}} here causes that "broken" state to be reset, *_preventing_* any future calls to {{CyclicBarrier.await()}} (from the thread that didn't already encounter an error) to throw {{BrokenBarrierException}} – instead that thread waits the full timeout amount (for a thread that has already errored out and is long gone) ---- In general, I think the design of {{ExportBuffers}} (and use of a {{CyclicBarrier.await(time)}} to exchange buffers between two threads) is more complicated and error prone then it needs to be, and should to be re-considered in order to ensure that these kind of "slow consumption" situations can't result the "filler" thread giving up after an arbitrary time limit even though the "writer" thread is still around and the client is still consuming results. I will open a separate jira to discuss options for redesigning this. In the meantime, my biggest concern is the "false success" that is returned to clients in the event that the filler code does timeout – so that client/caller code has no idea there was a problem and partial results have been returned. I'm attaching a patch that does 3 things: * Fixes both "writer" code paths to check {{getError()}} (and report it) regardless of how they finish ** I am not 100% certain that these are the _best_ places to check {{getError()}} or that there are not _additional_ places that should do this check - but this seemed to address the situations where I could reproduce this type of problem * Makes the {{CyclicBarrier.await(time)}} timeout value configurable via an (undocumented) EnvUtils sysprop. ** So that Solr deployments with large collections have an expert level mechanism for addressing TimeoutErrors they may encounter * remove the call the {{barrier.reset()}} ** It just makes no sense in this code, and leaving it in ensures that _if_ the "filler" times out, the "writer" code (when it is ready to {{exchangeBuffers()}} ) winds up waiting the full time limit itself for no reason. ...the way the code is designed, we don't really have a straight forward way to write a junit test for these types of situations, and if we're going to spend time refactoring to improve the test coverage I'd rather focus on bigger picture refactoring to eliminate the risk of these bugs in general – but I'm open to suggestions if folks want to contribute test cases that should be committed along with these changes. > Streaming Expressions: Exception swallowed and not propagated back to the > client leading to inconsistent results > ----------------------------------------------------------------------------------------------------------------- > > Key: SOLR-17416 > URL: https://issues.apache.org/jira/browse/SOLR-17416 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: streaming expressions > Reporter: Lamine > Priority: Major > > There appears to be a bug in the _ExportWriter/ExportBuffers_ implementation > within the Streaming Expressions plugin. Specifically, when an > InterruptedException occurs due to an ExportBuffers timeout, the exception is > swallowed and not propagated back to the client (still logged on the server > side though). > As a result, the client receives an EOF marker, thinking that it has received > the full set of results, when in fact it has only received partial results. > This leads to inconsistent search results, as the client is unaware that the > export process was interrupted and terminated prematurely. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org