Re: Slow DV equivalent of TermInSetQuery

2021-10-26 Thread Robert Muir
Well if, as I suggest, we use MultiTermQuery + DocValuesRewriteMethod to implement this, then the choice is yours. just run it against a "slow IndexReader" and go thru the ordinal map if you choose? There's nothing stopping you from doing that, and it will do what you want already. I just personal

Re: Slow DV equivalent of TermInSetQuery

2021-10-26 Thread Joel Bernstein
There are times, particularly in ecommerce and access control, where speed really matters. So, you build stuff that's really fast at query time, with a tradeoff at commit time. Joel Bernstein http://joelsolr.blogspot.com/ On Tue, Oct 26, 2021 at 5:31 PM Robert Muir wrote: > Sorry, I don't thi

Re: Slow DV equivalent of TermInSetQuery

2021-10-26 Thread Robert Muir
Sorry, I don't think there is a need to use any top-level ordinals. none of these docvalues-based query implementations need it. As far as query intersecting an input-stream, that is a big no-go. Lucene Queries need to have correct hashcode/equals/etc. That's why current stuff around this such as

Re: Slow DV equivalent of TermInSetQuery

2021-10-26 Thread Joel Bernstein
One more wrinkle for extremely large lists, is pass the list in as an InputStream which is a presorted binary representation of the ASIN's and slide a BytesRef across the stream and merge it with the SortedDocValues. This saves on all the object creation and String overhead for really long lists of

Re: Slow DV equivalent of TermInSetQuery

2021-10-26 Thread Joel Bernstein
If the list of ASIN's is presorted you can quickly merge it with the SortedDocValues and produce a FixedBitSet of the top level ordinals, which can be used as the post filter. This is a nice approach for things like passing in a long list of access control predicates. Joel Bernstein http://joelso

Re: Slow DV equivalent of TermInSetQuery

2021-10-26 Thread Adrien Grand
I opened https://issues.apache.org/jira/browse/LUCENE-10207 about these ideas. On Tue, Oct 26, 2021 at 7:52 PM Robert Muir wrote: > On Tue, Oct 26, 2021 at 1:37 PM Adrien Grand wrote: > > > > > And then we could make an IndexOrDocValuesQuery with both the > TermInSetQuery and this SDV.newSlowIn

Re: Slow DV equivalent of TermInSetQuery

2021-10-26 Thread Robert Muir
On Tue, Oct 26, 2021 at 1:37 PM Adrien Grand wrote: > > > And then we could make an IndexOrDocValuesQuery with both the > > TermInSetQuery and this SDV.newSlowInSetQuery? > > Unfortunately IndexOrDocValuesQuery relies on the fact that the "index" query > can evaluate its cost (ScorerSupplier#cos

Re: Slow DV equivalent of TermInSetQuery

2021-10-26 Thread Adrien Grand
> And then we could make an IndexOrDocValuesQuery with both the TermInSetQuery and this SDV.newSlowInSetQuery? Unfortunately IndexOrDocValuesQuery relies on the fact that the "index" query can evaluate its cost (ScorerSupplier#cost) without doing anything costly, which isn't the case for TermInSet

Re: Slow DV equivalent of TermInSetQuery

2021-10-26 Thread Robert Muir
On Tue, Oct 26, 2021 at 11:24 AM Robert Muir wrote: > > On Tue, Oct 26, 2021 at 10:58 AM Alan Woodward wrote: > > > > We have SortedSetDocValuesField.newSlowRangeQuery() which does something > > close to what you want here, I think. > > > > See also DocValuesRewriteMethod which might be useful,

Re: Slow DV equivalent of TermInSetQuery

2021-10-26 Thread Robert Muir
On Tue, Oct 26, 2021 at 10:58 AM Alan Woodward wrote: > > We have SortedSetDocValuesField.newSlowRangeQuery() which does something > close to what you want here, I think. > See also DocValuesRewriteMethod which might be useful, at least as a start. You'd have to express the "SetQuery" as a Multi

Re: Slow DV equivalent of TermInSetQuery

2021-10-26 Thread Alan Woodward
We have SortedSetDocValuesField.newSlowRangeQuery() which does something close to what you want here, I think. > On 26 Oct 2021, at 15:23, Michael McCandless > wrote: > > Hi Team, > > I was discussing this problem with Greg Miller (also at Amazon Product > Se

Slow DV equivalent of TermInSetQuery

2021-10-26 Thread Michael McCandless
Hi Team, I was discussing this problem with Greg Miller (also at Amazon Product Search): If I want to make a query that filters out a few primary keys (ASIN in our Amazon Product Search world), I can make a TermInSetQuery and add it as a MUST_NOT onto a BooleanQuery that has all the other interes