[
https://issues.apache.org/jira/browse/LUCENE-7958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16795905#comment-16795905
]
Adrien Grand commented on LUCENE-7958:
--------------------------------------
Thanks for sharing [~hermes]. I should resurrect the above patch when I have
some time!
> Give TermInSetQuery better advancing capabilities
> -------------------------------------------------
>
> Key: LUCENE-7958
> URL: https://issues.apache.org/jira/browse/LUCENE-7958
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Adrien Grand
> Priority: Minor
> Attachments: LUCENE-7958.patch
>
>
> If a TermInSetQuery has more than 15 matching terms on a given segment, then
> we consume all postings lists into a bitset and return an iterator over this
> bitset as a scorer. I would like to change it so that we keep the 15 postings
> lists that have the largest document frequencies and consume all other
> (shorter) postings lists into a bitset. In the end we return a disjunction
> over the N longest postings lists and the bit set. This could help consume
> fewer doc ids if the TermInSetQuery is intersected with other queries,
> especially if the document frequencies of the terms it wraps have a zipfian
> distribution.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]