[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16834463#comment-16834463
 ] 

Atri Sharma commented on LUCENE-8757:
-------------------------------------

:bq  I don't think we should push this if we already know we wanna do something 
different. That said, I am not convinced the numbers are good defaults. At the 
same time I don't have any numbers here do you have anything to back these 
defaults up?

 

Sure. The reason I was suggesting pushing this patch per se is because the 
other approach we are advancing would require a couple of new semantics to be 
introduced, so we could pote ntially want users to have an option to opt-in for 
either of the two. That said, I believe the cost based algorithm would also 
require some hard defaults to be present – to ensure that small segments do not 
get independent threads even if system had the capacity.

 

RE: The default constant values, these numbers are derived from empirical 
testing across different datasets in ESRally (nyc_taxis, logging) and looking 
at the default segment size distribution of wikipedia10M dataset in luceneutil. 
However, this might not be a good default size to split on.

 

One thing we could do (albeit expensive) is to take the mean number of 
documents in the corresponding LeafReaderContexts for a query as the split 
point. Would that be a better dynamic way?

> Better Segment To Thread Mapping Algorithm
> ------------------------------------------
>
>                 Key: LUCENE-8757
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8757
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Atri Sharma
>            Priority: Major
>         Attachments: LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to