javanna commented on code in PR #13542:
URL: https://github.com/apache/lucene/pull/13542#discussion_r1749979372
##########
lucene/core/src/test/org/apache/lucene/index/TestForTooMuchCloning.java:
##########
@@ -80,7 +80,7 @@ public void test() throws Exception {
// System.out.println("query clone count=" + queryCloneCount);
assertTrue(
"too many calls to IndexInput.clone during TermRangeQuery: " +
queryCloneCount,
- queryCloneCount < 50);
+ queryCloneCount <= Math.max(s.getLeafContexts().size(),
s.getSlices().length) * 4);
Review Comment:
I did more digging on this, things were much more predictable before because
we only have 20 docs and a couple of segments. `50` was then high enough for
all cases.
With intra-segment slicing, we end up with more slices, and the number of
clone calls is indeed a factor of the number of slices, but there is more to it
which depends on the collected terms, and I am not sure how to calculate that
exactly or make it predictable. The test does not seem to do that either so
far, it just has a high enough value.
I see two clone calls per partition done when `Weight#scorerSupplier` is
called. They are from two different places in the `IntersectTermsEnum`
constructor (line 89 and 98).
I see other two clone calls per partition done when
`ScorerSupplier#bulkScorer` is called.
That is where my current factor of 4 came from. It seems to work in the vast
majority of the cases. I have a seed (`F02FB6F046EE8C80`) where I have 9
partitions, but a total of 40 clone calls as part of search. Those additional 4
calls are rather unpredictable, I think that it depends on whether the query is
rewritten as a boolean query or not.
After all this analysis though, I realize that I came up with a formula that
is more restrictive than the previous fixed value. Here we are searching 9
partitions, and the total becomes `36` (against `40` real value), while
previously the fixed limit was `50`. I think I can increase the factor further
without worrying too much. Key is that we are currently limiting the max number
of partitions per segments to 5. We will need to adapt things in tests if we
remove that limit in the future, which is likely.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]