[
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16844595#comment-16844595
]
Adrien Grand commented on LUCENE-8757:
--------------------------------------
[~atris] I think it is still not correct since the values of the docBase/maxDoc
can only be seen by the current leaf collector while we need this check across
all leaf collectors that are created from the same collector.
Looking at the AssertingCollector again, it has a check that doc IDs are
collected in doc ID order, so I wonder why this assertion didn't trip with the
earlier version of your patch that sorted leaves by decreasing maxDoc. Maybe we
just got lucky? Nevertheless I think it's worth adding another assertion that
leaves are collected in the right order and that their doc ID space doesn't
intersect as described above, eg. we could record a {{previousLeafMaxDoc}} at
the same level as {{maxDoc}} in AssertinCollector, and then in
{{getLeafCollector}} do something like
{code}
assert context.docBase >= previousLeafMaxDoc; // generally equal, but might be
greater if some leaves are skipped
previousLeafMaxDoc = context.docBase + context.reader().maxDoc();
{code}
> Better Segment To Thread Mapping Algorithm
> ------------------------------------------
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Atri Sharma
> Assignee: Simon Willnauer
> Priority: Major
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch,
> LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch,
> LUCENE-8757.patch, LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one
> thread per segment. This is detrimental to performance in case of skew in
> segment sizes since small segments also get their dedicated thread. This can
> lead to performance degradation due to context switching overheads.
>
> A better algorithm which is cognizant of size skew would have better
> performance for realistic scenarios
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]