[
https://issues.apache.org/jira/browse/LUCENE-8829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861848#comment-16861848
]
Simon Willnauer edited comment on LUCENE-8829 at 6/12/19 8:56 AM:
------------------------------------------------------------------
I'd remove the _setShardIndex_ parameter alltogether and don't set it
was (Author: simonw):
I'd remove the _ setShardIndex_ parameter alltogether and don't set it
> TopDocs#Merge is Tightly Coupled To Number Of Collectors Involved
> -----------------------------------------------------------------
>
> Key: LUCENE-8829
> URL: https://issues.apache.org/jira/browse/LUCENE-8829
> Project: Lucene - Core
> Issue Type: Bug
> Reporter: Atri Sharma
> Priority: Major
> Attachments: LUCENE-8829.patch, LUCENE-8829.patch, LUCENE-8829.patch,
> LUCENE-8829.patch
>
>
> While investigating LUCENE-8819, I understood that TopDocs#merge's order of
> results are indirectly dependent on the number of collectors involved in the
> merge. This is troubling because 1) The number of collectors involved in a
> merge are cost based and directly dependent on the number of slices created
> for the parallel searcher case. 2) TopN hits code path will invoke merge with
> a single Collector, so essentially, doing the same TopN query with single
> threaded and parallel threaded searcher will invoke different order of
> results, which is a bad invariant that breaks.
>
> The reason why this happens is because of the subtle way TopDocs#merge sets
> shardIndex in the ScoreDoc population during populating the priority queue
> used for merging. ShardIndex is essentially set to the ordinal of the
> collector which generates the hit. This means that the shardIndex is
> dependent on the number of collectors, even for the same set of hits.
>
> In case of no sort order specified, shardIndex is used for tie breaking when
> scores are equal. This translates to different orders for same hits with
> different shardIndices.
>
> I propose that we remove shardIndex from the default tie breaking mechanism
> and replace it with docID. DocID order is the de facto that is expected
> during collection, so it might make sense to use the same factor during tie
> breaking when scores are the same.
>
> CC: [~ivera]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]