Re: Optimizing a boolean query for 100s of term clauses

2020-06-24 Thread Alex K
Hi Toke. Indeed a nice coincidence. It's an interesting and fun problem space! My implementation isn't specific to any particular dataset or access pattern (i.e. infinite vs. subset). So far the plugin supports exact L1, L2, Jaccard, Hamming, and Angular similarities with LSH variants for all but

Re: Optimizing a boolean query for 100s of term clauses

2020-06-24 Thread Alex K
Thanks Michael. I managed to translate the TermInSetQuery into Scala yesterday so now I can modify it in my codebase. This seems promising so far. Fingers crossed there's a way to maintain scores without basically converging to the BooleanQuery implementation. - AK On Wed, Jun 24, 2020 at 8:40 AM

Re: Optimizing a boolean query for 100s of term clauses

2020-06-24 Thread Toke Eskildsen
On Tue, 2020-06-23 at 09:50 -0400, Alex K wrote: > I'm working on an Elasticsearch plugin (using Lucene internally) that > allows users to index numerical vectors and run exact and approximate > k-nearest-neighbors similarity queries. Quite a coincidence. I'm looking into the same thing :-) > 1

Re: Optimizing a boolean query for 100s of term clauses

2020-06-24 Thread Michael Sokolov
Yeah that will require some changes since what it does currently is to maintain a bitset, and or into it repeatedly (once for each term's docs). To maintain counts, you'd need a counter per doc (rather than a bit), and you might lose some of the speed... On Tue, Jun 23, 2020 at 8:52 PM Alex K wro