Hello everybody,

I have observed an unexpected behavior in Lucene, and I am unsure whether this is a bug, or a missing warning in the documentation:

I am using the IndexSearcher with an ExecutorService in order to take advantage of multiple CPU cores during the searches. I want to limit the number of cores a single search can occupy, so I have overwritten the IndexSearcher method
    protected LeafSlice[] slices(List<LeafReaderContext> leaves)
to return a fixed number of Slices. (e.g. 4).

I tried to create slices that are about the same size by looping over the leaves (ordered by size descending) and adding the current leaf to the slice with the smallest number of documents.

This worked well, until I stumbled upon a query for which searchAfter seemed to skip hits, so that the total number of hits obtained by multiple calls to searchAfter was lower than TopDocs.totalHits.

The issue seems to be how searchAfter works vs how TopDocs.merge works:

searchAfter skips every document with a higher score than the "after" document. In case of equal scores, it uses the document id and skips every document with a <= document id (see PagingFieldCollector).

TopDocs.merge uses the score to determine which hits should be part of the merged TopDocs. In case of equal scores, it uses the shard index (this corresponds to the slices the IndexSearcher uses) to break ties (see ScoreMergeSortQueue.lessThan)

So if the shards are noncontinuous (as they are in my case), searchAfter uses a different way of sorting the documents than TopDocs.merge, and therefore hits are skipped.

Here are my questions:

* Are slices meant to be continuous "sublists" of the passed leaves-list? Or is my way of slicing meant to be supported? * If my way of slicing is not supported, could you either add a warning to the javadocs of the slices method or maybe even add a check for a legal return value of slices()?
* Should I create a jira issue for this?

Sorry for the wall of text, I hope I explained the problem in an understandable way!

Thank you and best regards
Christoph


Reply via email to