Hi Toke. Indeed a nice coincidence. It's an interesting and fun problem
space!
My implementation isn't specific to any particular dataset or access
pattern (i.e. infinite vs. subset).
So far the plugin supports exact L1, L2, Jaccard, Hamming, and Angular
similarities with LSH variants for all but
Thanks Michael. I managed to translate the TermInSetQuery into Scala
yesterday so now I can modify it in my codebase. This seems promising so
far. Fingers crossed there's a way to maintain scores without basically
converging to the BooleanQuery implementation.
- AK
On Wed, Jun 24, 2020 at 8:40 AM
On Tue, 2020-06-23 at 09:50 -0400, Alex K wrote:
> I'm working on an Elasticsearch plugin (using Lucene internally) that
> allows users to index numerical vectors and run exact and approximate
> k-nearest-neighbors similarity queries.
Quite a coincidence. I'm looking into the same thing :-)
> 1
Yeah that will require some changes since what it does currently is to
maintain a bitset, and or into it repeatedly (once for each term's
docs). To maintain counts, you'd need a counter per doc (rather than a
bit), and you might lose some of the speed...
On Tue, Jun 23, 2020 at 8:52 PM Alex K wro