[
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830173#comment-16830173
]
Michael McCandless commented on LUCENE-8757:
--------------------------------------------
Thanks [~atris] – I agree it's important to have better defaults for how we
coalesce segments into per-query-per-thread work units. A few small comments:
* Can you insert {{_}} in the big number constants (e.g. {{25000000}})? Makes
it easier to read, and open-source code is written for reading :)
* I think something is wrong with {{docSum}} – you only set it, and never add
to it? I think the intention is to sum up docs in multiple adjacent (sorted by
{{maxDoc}}) segments until that count exceeds {{25000000}}?
* How did you pick {{25000000}} and {{100}} as good constants? We are using
much smaller values in our production infrastructure – {{250_000}} and {{5}},
admittedly after only a little experimentation.
* Can you add some tests? You can maybe make the slice method a package
private static method and then create test cases with "interesting"
{{LeafReaderContext}} combinations? In particular, a test case exposing the
{{docSum}} bug would be great, then fix that bug, then see the test case pass.
> Better Segment To Thread Mapping Algorithm
> ------------------------------------------
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Atri Sharma
> Priority: Major
> Attachments: LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one
> thread per segment. This is detrimental to performance in case of skew in
> segment sizes since small segments also get their dedicated thread. This can
> lead to performance degradation due to context switching overheads.
>
> A better algorithm which is cognizant of size skew would have better
> performance for realistic scenarios
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]