Hello everybody,
I have observed an unexpected behavior in Lucene, and I am unsure
whether this is a bug, or a missing warning in the documentation:
I am using the IndexSearcher with an ExecutorService in order to take
advantage of multiple CPU cores during the searches. I want to limit the
number of cores a single search can occupy, so I have overwritten the
IndexSearcher method
protected LeafSlice[] slices(List<LeafReaderContext> leaves)
to return a fixed number of Slices. (e.g. 4).
I tried to create slices that are about the same size by looping over
the leaves (ordered by size descending) and adding the current leaf to
the slice with the smallest number of documents.
This worked well, until I stumbled upon a query for which searchAfter
seemed to skip hits, so that the total number of hits obtained by
multiple calls to searchAfter was lower than TopDocs.totalHits.
The issue seems to be how searchAfter works vs how TopDocs.merge works:
searchAfter skips every document with a higher score than the "after"
document. In case of equal scores, it uses the document id and skips
every document with a <= document id (see PagingFieldCollector).
TopDocs.merge uses the score to determine which hits should be part of
the merged TopDocs. In case of equal scores, it uses the shard index
(this corresponds to the slices the IndexSearcher uses) to break ties
(see ScoreMergeSortQueue.lessThan)
So if the shards are noncontinuous (as they are in my case), searchAfter
uses a different way of sorting the documents than TopDocs.merge, and
therefore hits are skipped.
Here are my questions:
* Are slices meant to be continuous "sublists" of the passed
leaves-list? Or is my way of slicing meant to be supported?
* If my way of slicing is not supported, could you either add a warning
to the javadocs of the slices method or maybe even add a check for a
legal return value of slices()?
* Should I create a jira issue for this?
Sorry for the wall of text, I hope I explained the problem in an
understandable way!
Thank you and best regards
Christoph