[
https://issues.apache.org/jira/browse/SOLR-17815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alessandro Benedetti updated SOLR-17815:
----------------------------------------
Fix Version/s: 10.0
Resolution: Fixed
Status: Resolved (was: Patch Available)
> Add parameter to regulate for ACORN-based filtering in vector search?
> ---------------------------------------------------------------------
>
> Key: SOLR-17815
> URL: https://issues.apache.org/jira/browse/SOLR-17815
> Project: Solr
> Issue Type: New Feature
> Components: vector-search
> Reporter: Alessandro Benedetti
> Priority: Major
> Labels: pull-request-available
> Fix For: 10.0
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> ACORN is an interesting approach to optimised filtered vector search:
> https://arxiv.org/abs/2403.04871
> ACORN: Performant and Predicate-Agnostic Search Over Vector Embeddings and
> Structured Data
> Liana Patel, Peter Kraft, Carlos Guestrin, Matei Zaharia
> h1. LUCENE IMPLEMENTATION
> This was implemented in Lucene with
> https://github.com/apache/lucene/pull/14160
> Specifically in org.apache.lucene.util.hnsw.FilteredHnswGraphSearcher
> that can be used in Solr via
> org.apache.lucene.search.knn.KnnSearchStrategy.Hnsw
> /**
> * Create a new Hnsw strategy
> *
> * @param filteredSearchThreshold threshold for filtered search, a
> percentage value from 0 to
> * 100 where 0 means never use filtered search and 100 means always
> use filtered search.
> */
> public Hnsw(int filteredSearchThreshold) {
> if (filteredSearchThreshold < 0 || filteredSearchThreshold > 100) {
> throw new IllegalArgumentException("filteredSearchThreshold must be
> >= 0 and <= 100");
> }
> this.filteredSearchThreshold = filteredSearchThreshold;
> }
> h1. DEFAULT
> ACORN with a threshold of '60' is the default when we upgrade to Lucene 10.x .
> {code:java}
> /**
> * Find the <code>k</code> nearest documents to the target vector according
> to the vectors in the
> * given field. <code>target</code> vector.
> *
> * @param field a field that has been indexed as a {@link
> KnnFloatVectorField}.
> * @param target the target of the search
> * @param k the number of documents to find
> * @param filter a filter applied before the vector search
> * @throws IllegalArgumentException if <code>k</code> is less than 1
> */
> public KnnFloatVectorQuery(String field, float[] target, int k, Query filter)
> {
> this(field, target, k, filter, DEFAULT);
> }
> {code}
> focus on the DEFAULT
> that's the
> public static final Hnsw DEFAULT = new
> Hnsw(DEFAULT_FILTERED_SEARCH_THRESHOLD);
> where Hnsw is org.apache.lucene.search.knn.KnnSearchStrategy.Hnsw
> so that's the default search strategy
> now, what does it mean the '60' treshold?
> {code:java}
> @param filteredSearchThreshold threshold for filtered search, a percentage
> value from 0 to
> * 100 where 0 means never use filtered search and 100 means always use
> filtered search.
> {code}
> so with a 0 no ACORN search at all
> with anything greater than 0 Lucene will enable or not ACORN based on this
> condition:
> {code:java}
> org.apache.lucene.search.knn.KnnSearchStrategy.Hnsw#useFilteredSearch
> if (acceptOrds != null
> // We can only use filtered search if we know the maxConn
> && graph.maxConn() != HnswGraph.UNKNOWN_MAX_CONN
> && filteredDocCount > 0
> && hnswStrategy.useFilteredSearch((float) filteredDocCount /
> graph.size())) {
> innerSearcher =
> FilteredHnswGraphSearcher.create(knnCollector.k(), graph,
> filteredDocCount, acceptOrds);
> {code}
> Disabling ACORN can be obtained at the KnnSearchStrategy level passing '0' as
> the threshold.
> public static class Hnsw extends KnnSearchStrategy {
> public static final Hnsw DEFAULT = new
> Hnsw(DEFAULT_FILTERED_SEARCH_THRESHOLD);
> h1. SCOPE OF THIS ISSUE
> This issue should study when ACORN is useful or not, and if the default is
> not good enough for Solr.
> If not, the expected result from this task is a detailed motivation and the
> implementation of a parameter that gives users the possibility of
> disabling/regulating the ACORN behavior.
> Having flexibility is great, but it may not be necessary to add the
> additional complexity.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]