[
https://issues.apache.org/jira/browse/LUCENE-8875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16882295#comment-16882295
]
ASF subversion and git services commented on LUCENE-8875:
---------------------------------------------------------
Commit 7339eb272c30e993e0a8e73154fdfca8ef9879e4 in lucene-solr's branch
refs/heads/branch_8x from Atri Sharma
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=7339eb2 ]
LUCENE-8875: Introduce Optimized Collector For Large Number Of Hits (#754)
This commit introduces a new collector which is optimized for
cases when the number of hits is large and/or the actual hits
collected are sparse in comparison to the number of hits
requested.
> Should TopScoreDocCollector Always Populate Sentinel Values?
> ------------------------------------------------------------
>
> Key: LUCENE-8875
> URL: https://issues.apache.org/jira/browse/LUCENE-8875
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Atri Sharma
> Priority: Major
> Time Spent: 9h
> Remaining Estimate: 0h
>
> TopScoreDocCollector always initializes HitQueue as the PQ implementation,
> and instruct HitQueue to populate with sentinels. While this is a great
> safety mechanism, for very large datasets where the query's selectivity is
> high, the sentinel population can be redundant and can become a large enough
> bottleneck in itself. Does it make sense to introduce a new parameter in
> TopScoreDocCollector which uses a heuristic (say number of hits > 10k) and
> does not populate sentinels?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]