gus-asf commented on code in PR #2960: URL: https://github.com/apache/solr/pull/2960#discussion_r1935871739
########## solr/core/src/java/org/apache/solr/search/EarlyTerminatingCollector.java: ########## @@ -29,11 +30,15 @@ */ public class EarlyTerminatingCollector extends FilterCollector { + private final int chunkSize = 100; // Check across threads only at a chunk size Review Comment: How was this chosen? Should it be tuneable via env/sysprop? ########## solr/solr-ref-guide/modules/query-guide/pages/common-query-parameters.adoc: ########## @@ -399,6 +399,23 @@ For example, setting `cpuAllowed=500` gives a limit of at most 500 ms of CPU tim All other considerations regarding partial results listed for the `timeAllowed` parameter apply here, too. + +== maxHits Parameter + +[%autowidth,frame=none] +|=== +|Optional |Default: `false` +|=== + +This parameter specifies the max number of hits a searcher will iterate through before early terminating the search. +The count is per shard and across all threads involved in case of multi-threaded search. This parameter works +in conjunction with other parameters that could early terminate a search, ex: _timeAllowed_ etc. In case the search +was early terminated due to it exceeding maxHits a _terminatedEarly_ header in the response will be set along with +_partialResults_ to indicate the same. Note that the _partialResults_ flag could be set in the absence of the _maxHits_ +parameter due to other limits like _timeAllowed_ or _cpuAllowed_. +Note : the hits counted will need not be exactly equal to the maxHits provided, however it will be within range of this value. + + Review Comment: I have a similar concern as Houston regarding helping the user understand when to use this, here's my attempt to rewrite your text based on what I've gathered... of course if I misunderstood something or I'm too wordy we should fix that ;) As a side note I think we're supposed to write these as "ventilated prose" which is one sentence per line, no line breaks in a sentence. ``` == maxHits Parameter [%autowidth,frame=none] |=== |Optional |Default: `false` |=== This parameter specifies the max number of hits a searcher will iterate through. Searchers will arbitrarily ignore any number of additional hits beyond this value. Practically speaking, the count is per shard and across all threads involved in case of multi-threaded search. The intention of this feature is to favor speed over perfect relevancy & recall. The trade-off is that if one shard contains many relevant hits and another contains a few less relevant hits the less relevant hits from the second shard may get returned instead of the more relevant hits that were clustered in the first shard. This parameter works in conjunction with other parameters that could early terminate a search, ex: _timeAllowed_ etc. In case the search was early terminated due to it exceeding maxHits a _terminatedEarly_ header in the response will be set along with _partialResults_ to indicate the same. The _partialResults_ flag could be set in the absence of the _maxHits_ parameter due to other limits like _timeAllowed_ or _cpuAllowed_. ``` I have to admit I don't understand this bit: ``` NOTE: the hits counted will need not be exactly equal to the maxHits provided, however it will be within range of this value. ``` ########## solr/core/src/java/org/apache/solr/search/QueryCommand.java: ########## @@ -194,7 +195,8 @@ public QueryCommand setNeedDocSet(boolean needDocSet) { } public boolean getTerminateEarly() { - return (flags & SolrIndexSearcher.TERMINATE_EARLY) != 0; + return (flags & SolrIndexSearcher.TERMINATE_EARLY) != 0 Review Comment: As I noted in another comment I think this and the Lucene feature are significantly different. I don't like conflating them. ########## solr/core/src/java/org/apache/solr/search/MultiThreadedSearcher.java: ########## @@ -190,10 +194,16 @@ public ScoreMode scoreMode() { static class SearchResult { final ScoreMode scoreMode; private final Object[] result; + final boolean isTerminatedEarly; public SearchResult(ScoreMode scoreMode, Object[] result) { + this(scoreMode, result, false); Review Comment: My IDE claims this unused. Since this is a package private inner class we probably don't need to maintain this constructor for the API. ########## solr/core/src/java/org/apache/solr/search/EarlyTerminatingCollector.java: ########## @@ -61,11 +69,25 @@ public LeafCollector getLeafCollector(LeafReaderContext context) throws IOExcept public void collect(int doc) throws IOException { super.collect(doc); numCollected++; - if (maxDocsToCollect <= numCollected) { + terminatedEarly = maxDocsToCollect <= numCollected; + if (numCollected % chunkSize == 0) { + pendingDocsToCollect.add(chunkSize); + final long overallCollectedDocCount = pendingDocsToCollect.intValue(); + terminatedEarly = overallCollectedDocCount >= maxDocsToCollect; Review Comment: I think the code would be more legible if you just threw in two cases rather than tracking it with a boolean that is overwritten. This initially looked like a logic error. I expected to an `|=` until I stopped and thought about it for a while to convince myself that the second case can't be false after the first is true. Seems better to me if future maintainers don't have to sort that out for themselves. ########## solr/core/src/java/org/apache/solr/search/EarlyTerminatingCollector.java: ########## @@ -43,11 +48,14 @@ public class EarlyTerminatingCollector extends FilterCollector { * @param maxDocsToCollect - the maximum number of documents to Collect */ public EarlyTerminatingCollector(Collector delegate, int maxDocsToCollect) { - super(delegate); - assert 0 < maxDocsToCollect; Review Comment: Why are the asserts removed? ########## solr/core/src/java/org/apache/solr/search/EarlyTerminatingCollector.java: ########## @@ -61,11 +69,25 @@ public LeafCollector getLeafCollector(LeafReaderContext context) throws IOExcept public void collect(int doc) throws IOException { super.collect(doc); numCollected++; - if (maxDocsToCollect <= numCollected) { + terminatedEarly = maxDocsToCollect <= numCollected; + if (numCollected % chunkSize == 0) { + pendingDocsToCollect.add(chunkSize); + final long overallCollectedDocCount = pendingDocsToCollect.intValue(); + terminatedEarly = overallCollectedDocCount >= maxDocsToCollect; + } + if (terminatedEarly) { throw new EarlyTerminatingCollectorException( - numCollected, prevReaderCumulativeSize + (doc + 1)); + maxDocsToCollect, prevReaderCumulativeSize + (doc + 1)); } } }; } + + public boolean isTerminatedEarly() { Review Comment: My IDE says this method is unused... do we use it somewhere I've missed? ########## solr/core/src/java/org/apache/solr/search/EarlyTerminatingCollector.java: ########## @@ -29,11 +30,15 @@ */ public class EarlyTerminatingCollector extends FilterCollector { + private final int chunkSize = 100; // Check across threads only at a chunk size + private final int maxDocsToCollect; private int numCollected = 0; private int prevReaderCumulativeSize = 0; private int currentReaderSize = 0; + private final LongAdder pendingDocsToCollect; Review Comment: LongAdder is a neat class I wasn't aware of, but my read of the javadocs for LongAdder is that it's for rarely read values like statistics/metrics (frequent update, infrequent read)... the way you are using it you read from it immediately after every add. Maybe just AtomicLong? Happy to hear arguments to the contrary, but everyone knows what an AtomicLong is so slightly simpler to have that (and perhaps slightly less memory usage according to the javadocs). ########## solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java: ########## @@ -295,7 +295,7 @@ private Collector buildAndRunCollectorChain( final boolean terminateEarly = cmd.getTerminateEarly(); if (terminateEarly) { - collector = new EarlyTerminatingCollector(collector, cmd.getLen()); + collector = new EarlyTerminatingCollector(collector, cmd.getMaxHitsTerminateEarly()); Review Comment: How is it we are replacing getLen() here... how does this not break SOLR-3240 (https://github.com/apache/solr/commit/7141cb35a761eea1f0e6f3ad6abf84a4c972fedc#diff-99978700f1c69d6a5f6c2190f89c98cfe20c441161db5a183ec002e15cb1be28) ########## solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java: ########## @@ -329,12 +329,12 @@ private Collector buildAndRunCollectorChain( if (collector instanceof DelegatingCollector) { ((DelegatingCollector) collector).complete(); } - throw etce; + qr.setPartialResults(true); Review Comment: Does not throwing here effect SOLR-3240, (the original reason for ETCE) ? ########## solr/core/src/java/org/apache/solr/search/QueryCommand.java: ########## @@ -194,7 +195,8 @@ public QueryCommand setNeedDocSet(boolean needDocSet) { } public boolean getTerminateEarly() { - return (flags & SolrIndexSearcher.TERMINATE_EARLY) != 0; + return (flags & SolrIndexSearcher.TERMINATE_EARLY) != 0 Review Comment: And after looking deeper, there's something funny going on in flags. There are two flags: ``` static final int SEGMENT_TERMINATE_EARLY = 0x08; public static final int TERMINATE_EARLY = 0x04; ``` yet I see this code (ide says it's unused) in QueryCommand: ``` public QueryCommand setTerminateEarly(boolean segmentTerminateEarly) { if (segmentTerminateEarly) { return setFlags(SolrIndexSearcher.TERMINATE_EARLY); } else { return clearFlags(SolrIndexSearcher.TERMINATE_EARLY); } } ``` as well as: ``` public QueryCommand setSegmentTerminateEarly(boolean segmentSegmentTerminateEarly) { if (segmentSegmentTerminateEarly) { return setFlags(SolrIndexSearcher.SEGMENT_TERMINATE_EARLY); } else { return clearFlags(SolrIndexSearcher.SEGMENT_TERMINATE_EARLY); } } ``` So there's something we should maybe understand and maybe clean up here... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org