Re: [PR] Add support for intra-segment search concurrency [lucene]

via GitHub Mon, 09 Sep 2024 03:10:22 -0700


javanna commented on code in PR #13542:
URL: https://github.com/apache/lucene/pull/13542#discussion_r1749979372



##########
lucene/core/src/test/org/apache/lucene/index/TestForTooMuchCloning.java:
##########
@@ -80,7 +80,7 @@ public void test() throws Exception {
     // System.out.println("query clone count=" + queryCloneCount);
     assertTrue(
         "too many calls to IndexInput.clone during TermRangeQuery: " + 
queryCloneCount,
-        queryCloneCount < 50);
+        queryCloneCount <= Math.max(s.getLeafContexts().size(), 
s.getSlices().length) * 4);

Review Comment:
   I did more digging on this, things were much more predictable before because 
we only have 20 docs and a couple of segments. `50` was then high enough for 
all cases.
   
   With intra-segment slicing, we end up with more slices, and the number of 
clone calls is indeed a factor of the number of slices, but there is more to it 
which depends on the collected terms, and I am not sure how to calculate that 
exactly or make it predictable. The test does not seem to do that either so 
far, it just has a high enough value.
   
   I see two clone calls per partition done when `Weight#scorerSupplier` is 
called. They are from two different places in the `IntersectTermsEnum` 
constructor (line 89 and 98). 
   
   I see other two clone calls per partition done when 
`ScorerSupplier#bulkScorer` is called.
   
   That is where my current factor of 4 came from. It seems to work in the vast 
majority of the cases. I have a seed (`F02FB6F046EE8C80`) where I have 9 
partitions, but a total of 40 clone calls as part of search. Those additional 4 
calls are rather unpredictable, I think that it depends on whether the query is 
rewritten as a boolean query or not. 
   
   After all this analysis though, I realize that I came up with a formula that 
is more restrictive than the previous fixed value. Here we are searching 9 
partitions, and the total becomes `36` (against `40` real value), while 
previously the fixed limit was `50`. I think I can increase the factor further 
without worrying too much. Key is that we are currently limiting the max number 
of partitions per segments to 5. We will need to adapt things in tests if we 
remove that limit in the future, which is likely.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Add support for intra-segment search concurrency [lucene]

Reply via email to