We have SortedSetDocValuesField.newSlowRangeQuery() which does something close to what you want here, I think.
> On 26 Oct 2021, at 15:23, Michael McCandless <luc...@mikemccandless.com > <mailto:luc...@mikemccandless.com>> wrote: > > Hi Team, > > I was discussing this problem with Greg Miller (also at Amazon Product > Search): > > If I want to make a query that filters out a few primary keys (ASIN in our > Amazon Product Search world), I can make a TermInSetQuery and add it as a > MUST_NOT onto a BooleanQuery that has all the other interesting clauses for > my query. > > But if I have many, many ASINs to filter out, at some point it may become > more efficient to just use doc values and filter them out like Solr's > "post-filter" / during collection, e.g. by loading the BINARY value or SORTED > (globalized) ordinal, and checking e.g. a HashSet to see if it should be > skipped. Not using the inverted index at all... > > Do we already have such a "slow DV TermInSet" query? > > It seems like it could belong in SortedDocValues where we already have > newSlowRangeQuery, newSlowExactQuery, we could add a newSlowInSetQuery? > > And then we could make an IndexOrDocValuesQuery with both the TermInSetQuery > and this SDV.newSlowInSetQuery? > > Or maybe there is already a good way to do this in Lucene? > > Thanks!, > > Mike McCandless > > http://blog.mikemccandless.com <http://blog.mikemccandless.com/>