I reduced the benchmark as far as I could, and now got these results, TermsInSet being a lot slower compared to the Terms/SHOULD.
BenchmarkOrQuery.benchmarkTerms thrpt 5 190820.510 ± 16667.411 ops/s BenchmarkOrQuery.benchmarkTermsInSet thrpt 5 110548.345 ± 7490.169 ops/s @Fork(1) @Measurement(iterations = 5, time = 10) @OutputTimeUnit(TimeUnit.SECONDS) @Warmup(iterations = 3, time = 1) @Benchmark public void benchmarkTerms(final MyState myState) { try { final IndexSearcher searcher = myState.matchedReaders.getIndexSearcher(); final BooleanQuery.Builder b = new BooleanQuery.Builder(); for (final String role : myState.user.getAdditionalRoles()) { b.add(new TermQuery(new Term(roles, new BytesRef(role))), BooleanClause.Occur.SHOULD); } searcher.count(b.build()); } catch (final IOException e) { e.printStackTrace(); } } @Fork(1) @Measurement(iterations = 5, time = 10) @OutputTimeUnit(TimeUnit.SECONDS) @Warmup(iterations = 3, time = 1) @Benchmark public void benchmarkTermsInSet(final MyState myState) { try { final IndexSearcher searcher = myState.matchedReaders.getIndexSearcher(); final Set<BytesRef> roles = myState.user.getAdditionalRoles().stream().map(BytesRef::new).collect(Collectors.toSet()); searcher.count(new TermInSetQuery(BenchmarkOrQuery.roles, roles)); } catch (final IOException e) { e.printStackTrace(); } } On Tue, Oct 13, 2020 at 11:56 AM Rob Audenaerde <rob.audenae...@gmail.com> wrote: > Hello Adrien, > > Thanks for the swift reply. I'll add the details: > > Lucene version: 8.6.2 > > The restrictionQuery is indeed a conjunction, it allowes for a document to > be a hit if the 'roles' field is empty as well. It's used within a > bigger query builder; so maybe I did something else wrong. I'll rewrite the > benchmark to just benchmark the TermsInSet and Terms. > > It never occurred (hah) to me to use Occur.FILTER, that is a good point to > check as well. > > As you put it, I would expect the results to be very similar, as I do not > react the 16 terms in the TermInSet. I'll let you know what I'll find. > > On Tue, Oct 13, 2020 at 11:48 AM Adrien Grand <jpou...@gmail.com> wrote: > >> Can you give us a few more details: >> - What version of Lucene are you testing? >> - Are you benchmarking "restrictionQuery" on its own, or its conjunction >> with another query? >> >> You mentioned that you combine your "restrictionQuery" and the user query >> with Occur.MUST, Occur.FILTER feels more appropriate for >> "restrictionQuery" >> since it should not contribute to scoring. >> >> TermsInSetQuery automatically executes like a BooleanQuery when the number >> of clauses is less than 16, so I would not expect major performance >> differences between a TermInSetQuery over less than 16 terms and a >> BooleanQuery wrapped in a ConstantScoreQuery. >> >> On Tue, Oct 13, 2020 at 11:35 AM Rob Audenaerde <rob.audenae...@gmail.com >> > >> wrote: >> >> > Hello, >> > >> > I'm benchmarking an application which implements security on lucene by >> > adding a multivalue field "roles". If the user has one of these roles, >> he >> > can find the document. >> > >> > I implemented this as a Boolean and query, added the original query and >> the >> > restriction with Occur.MUST. >> > >> > I'm having some performance issues when counting the index (>60M docs), >> so >> > I thought about tweaking this restriction-implementation. >> > >> > I set-up a benchmark like this: >> > >> > I generate 2M documents, Each document has a multi-value "roles" field. >> The >> > "roles" field in each document has 4 values, taken from (2,2,1000,100) >> > unique values. >> > The user has (1,1,2,1) values for roles (so, 1 out of the 2, for the >> first >> > role, 1 out of 2 for the second, 2 out of the 1000 for the third value, >> and >> > 1 / 100 for the fourth). >> > >> > I got a somewhat unexpected performance difference. At first, I >> implemented >> > the restriction query like this: >> > >> > for (final String role : roles) { >> > restrictionQuery.add(new TermQuery(new Term("roles", new >> > BytesRef(role))), Occur.SHOULD); >> > } >> > >> > I then switched to a TermInSetQuery, which I thought would be faster >> > as it is using constant-scores. >> > >> > final Set<BytesRef> rolesSet = >> > roles.stream().map(BytesRef::new).collect(Collectors.toSet()); >> > restrictionQuery.add(new TermInSetQuery("roles", rolesSet), >> Occur.SHOULD); >> > >> > >> > However, the TermInSetQuery has about 25% slower ops/s. Is that to >> > be expected? I did not, as I thought the constant-scoring would be >> faster. >> > >> >> >> -- >> Adrien >> >