Re: Optimizing a boolean query for 100s of term clauses

2020-06-23 Thread Alex K
The TermsInSetQuery is definitely faster. Unfortunately it doesn't seem to return the number of terms that matched in a given document. Rather it just returns the boost value. I'll look into copying/modifying the internals to return the number of matched terms. Thanks - AK On Tue, Jun 23, 2020 at

Re: Optimizing a boolean query for 100s of term clauses

2020-06-23 Thread Alex K
Hi Michael, Thanks for the quick response! I will look into the TermInSetQuery. My usage of "heap" might've been confusing. I'm using a FunctionScoreQuery from Elasticsearch. This gets instantiated with a Lucene query, in this case the boolean query as I described it, as well as a custom ScoreFun

Re: Optimizing a boolean query for 100s of term clauses

2020-06-23 Thread Michael Sokolov
You might consider using a TermInSetQuery in place of a BooleanQuery for the hashes (since they are all in the same field). I don't really understand why you are seeing so much cost in the heap - it's sounds as if you have a single heap with mixed scores - those generated by the BooleanQuery and t

Optimizing a boolean query for 100s of term clauses

2020-06-23 Thread Alex K
Hello all, I'm working on an Elasticsearch plugin (using Lucene internally) that allows users to index numerical vectors and run exact and approximate k-nearest-neighbors similarity queries. I'd like to get some feedback about my usage of BooleanQueries and TermQueries, and see if there are any op