[
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16834463#comment-16834463
]
Atri Sharma commented on LUCENE-8757:
-------------------------------------
:bq I don't think we should push this if we already know we wanna do something
different. That said, I am not convinced the numbers are good defaults. At the
same time I don't have any numbers here do you have anything to back these
defaults up?
Sure. The reason I was suggesting pushing this patch per se is because the
other approach we are advancing would require a couple of new semantics to be
introduced, so we could pote ntially want users to have an option to opt-in for
either of the two. That said, I believe the cost based algorithm would also
require some hard defaults to be present – to ensure that small segments do not
get independent threads even if system had the capacity.
RE: The default constant values, these numbers are derived from empirical
testing across different datasets in ESRally (nyc_taxis, logging) and looking
at the default segment size distribution of wikipedia10M dataset in luceneutil.
However, this might not be a good default size to split on.
One thing we could do (albeit expensive) is to take the mean number of
documents in the corresponding LeafReaderContexts for a query as the split
point. Would that be a better dynamic way?
> Better Segment To Thread Mapping Algorithm
> ------------------------------------------
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Atri Sharma
> Priority: Major
> Attachments: LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one
> thread per segment. This is detrimental to performance in case of skew in
> segment sizes since small segments also get their dedicated thread. This can
> lead to performance degradation due to context switching overheads.
>
> A better algorithm which is cognizant of size skew would have better
> performance for realistic scenarios
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]