+1 from me too, this will be a really helpful feature. I've done some background research and found a couple aspects that are tricky. If the filter only matches a small percentage of documents, HNSW can quickly degrade to a brute-force scan. With live docs this isn't a big problem, because our merge policies help keep deleted docs down to a reasonable percentage. But with an arbitrary query, you could easily filter away most documents, leading to a surprisingly slow kNN search. This blog post from the Weaviate engine has a graph showing a slowdown past ~20% filter selectivity: https://towardsdatascience.com/effects-of-filtered-hnsw-searches-on-recall-and-latency-434becf8041c. Looking forward to discussing more on the issue.
Julie On Wed, Jan 19, 2022 at 12:10 PM Joel Bernstein <[email protected]> wrote: > https://issues.apache.org/jira/browse/LUCENE-10382 > > > Joel Bernstein > http://joelsolr.blogspot.com/ > > > On Wed, Jan 19, 2022 at 2:59 PM Joel Bernstein <[email protected]> wrote: > >> Ok, I can create the jira. >> >> >> >> Joel Bernstein >> http://joelsolr.blogspot.com/ >> >> >> On Wed, Jan 19, 2022 at 2:49 PM Michael Sokolov <[email protected]> >> wrote: >> >>> +1 we should extend the functionality to support any Bits, not just >>> liveDocs; we need to propose an API. The implementation should not be >>> too hard - we need to intersect the user-supplied Bits with liveDocs >>> and use that to filter. >>> >>> On Wed, Jan 19, 2022 at 1:42 PM Joel Bernstein <[email protected]> >>> wrote: >>> > >>> > Hi, >>> > >>> > Thanks for all the work on the vector search! >>> > >>> > I was wondering if there was a way using KnnVectorQuery to filter the >>> docs this query looks at. Right now the searchLeaf method passes in the >>> liveDocs to LeafReader.searchNearestVectors, but there appears to be no way >>> to have the KnnVectorQuery operate on a subset of liveDocs. >>> > >>> > Thanks, >>> > >>> > Joel Bernstein >>> > http://joelsolr.blogspot.com/ >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [email protected] >>> For additional commands, e-mail: [email protected] >>> >>>
