[ 
https://issues.apache.org/jira/browse/SOLR-17815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Benedetti updated SOLR-17815:
----------------------------------------
    Fix Version/s: 10.0
       Resolution: Fixed
           Status: Resolved  (was: Patch Available)

> Add parameter to regulate for ACORN-based filtering in vector search?
> ---------------------------------------------------------------------
>
>                 Key: SOLR-17815
>                 URL: https://issues.apache.org/jira/browse/SOLR-17815
>             Project: Solr
>          Issue Type: New Feature
>          Components: vector-search
>            Reporter: Alessandro Benedetti
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 10.0
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> ACORN is an interesting approach to optimised filtered vector search: 
> https://arxiv.org/abs/2403.04871
> ACORN: Performant and Predicate-Agnostic Search Over Vector Embeddings and 
> Structured Data
> Liana Patel, Peter Kraft, Carlos Guestrin, Matei Zaharia
> h1. LUCENE IMPLEMENTATION
> This was implemented in Lucene with 
> https://github.com/apache/lucene/pull/14160
> Specifically in org.apache.lucene.util.hnsw.FilteredHnswGraphSearcher
> that can be used in Solr via 
> org.apache.lucene.search.knn.KnnSearchStrategy.Hnsw
> /**
>      * Create a new Hnsw strategy
>      *
>      * @param filteredSearchThreshold threshold for filtered search, a 
> percentage value from 0 to
>      *     100 where 0 means never use filtered search and 100 means always 
> use filtered search.
>      */
>     public Hnsw(int filteredSearchThreshold) {
>       if (filteredSearchThreshold < 0 || filteredSearchThreshold > 100) {
>         throw new IllegalArgumentException("filteredSearchThreshold must be 
> >= 0 and <= 100");
>       }
>       this.filteredSearchThreshold = filteredSearchThreshold;
>     }
> h1. DEFAULT
> ACORN with a threshold of '60' is the default when we upgrade to Lucene 10.x .
> {code:java}
> /**
>  * Find the <code>k</code> nearest documents to the target vector according 
> to the vectors in the
>  * given field. <code>target</code> vector.
>  *
>  * @param field a field that has been indexed as a {@link 
> KnnFloatVectorField}.
>  * @param target the target of the search
>  * @param k the number of documents to find
>  * @param filter a filter applied before the vector search
>  * @throws IllegalArgumentException if <code>k</code> is less than 1
>  */
> public KnnFloatVectorQuery(String field, float[] target, int k, Query filter) 
> {
>   this(field, target, k, filter, DEFAULT);
> }
> {code}
> focus on the DEFAULT
> that's the
> public static final Hnsw DEFAULT = new 
> Hnsw(DEFAULT_FILTERED_SEARCH_THRESHOLD);
> where Hnsw is org.apache.lucene.search.knn.KnnSearchStrategy.Hnsw
> so that's the default search strategy
> now, what does it mean the '60' treshold?
> {code:java}
>  @param filteredSearchThreshold threshold for filtered search, a percentage 
> value from 0 to
> *     100 where 0 means never use filtered search and 100 means always use 
> filtered search.
> {code}
> so with a 0 no ACORN search at all
> with anything greater than 0 Lucene will enable or not ACORN based on this 
> condition:
> {code:java}
> org.apache.lucene.search.knn.KnnSearchStrategy.Hnsw#useFilteredSearch
> if (acceptOrds != null
>     // We can only use filtered search if we know the maxConn
>     && graph.maxConn() != HnswGraph.UNKNOWN_MAX_CONN
>     && filteredDocCount > 0
>     && hnswStrategy.useFilteredSearch((float) filteredDocCount / 
> graph.size())) {
>   innerSearcher =
>       FilteredHnswGraphSearcher.create(knnCollector.k(), graph, 
> filteredDocCount, acceptOrds);
> {code}
> Disabling ACORN can be obtained at the KnnSearchStrategy level passing '0' as 
> the threshold.
>  public static class Hnsw extends KnnSearchStrategy {
>     public static final Hnsw DEFAULT = new 
> Hnsw(DEFAULT_FILTERED_SEARCH_THRESHOLD);
> h1. SCOPE OF THIS ISSUE
> This issue should study when ACORN is useful or not, and if the default is 
> not good enough for Solr.
> If not, the expected result from this task is a detailed motivation and the 
> implementation of a parameter that gives users the possibility of 
> disabling/regulating the ACORN behavior.
> Having flexibility is great, but it may not be necessary to add the 
> additional complexity.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to