Hi Kumar, Knn search in Apache Solr doesn't support any min-threshold parameter. To be honest, even if it did, you wouldn't be in a much better position: your perceived relevance won't necessarily match the 0-1 cosine similarity between your query and your vectors, and what you consider highly relevant may have a score of 0.35 for one query and 0.96 for another. Having such a parameter just delegates to the user the pain of setting up a useful threshold, which, trust me, it's not an easy (or maybe doable) job.
It's on my roadmap to add a sort of auto-cutting functionality based on the document score and Lucene also added a threshold-based search (which we may or may not port to Apache Solr). In the meantime, you can play with Hybrid Search (which will also be improved in the future): https://sease.io/2023/12/hybrid-search-with-apache-solr.html Cheers -------------------------- *Alessandro Benedetti* Director @ Sease Ltd. *Apache Lucene/Solr Committer* *Apache Solr PMC Member* e-mail: a.benede...@sease.io *Sease* - Information Retrieval Applied Consulting | Training | Open Source Website: Sease.io <http://sease.io/> LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter <https://twitter.com/seaseltd> | Youtube <https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github <https://github.com/seaseltd> On Fri, 26 Jan 2024 at 17:01, Charlie Hull <ch...@opensourceconnections.com> wrote: > Hi Kumar, > > kNN will return the k closest vectors, which as you've found out may not > be very close at all. Most of the approaches we're seeing as we work > with e-commerce clients involve combining kNN with a standard, lexical > search in some way - combining the results from both, or using one to > boost certain results. You might find this blog useful as it discusses > some strategies for coping with what you've found > > https://opensourceconnections.com/blog/2023/03/22/building-vector-search-in-chorus-a-technical-deep-dive/ > > best > > Charlie > > > On 26/01/2024 12:18, kumar gaurav wrote: > > HI Srijan > > > > Thanks for replying. > > > > I am using the BERT open source model to generate vectors. Are you aware > of > > any minSimilary parameter threshold in knn parser ? > > > > I am working with an ecommerce dataset. So I am getting the same non > > relevant results and the same score if I am using any invalid search > token > > which is not present in my index. > > > > I want to apply some kind of minimum similarity threshold so I can > > throw out the outliers and can get very nearest documents only. > > > > > > > > On Fri, 26 Jan 2024 at 17:05, Srijan <shree...@gmail.com> wrote: > > > >> I have been testing dense vector search on Solr and it's been working > great > >> for me so far. Mine is an image search use case using OpenAI's CLIP > model > >> but the configurations are pretty much the same as yours. What embedding > >> model are you using? And can you share a portion of the actual query? > >> > >> On Fri, Jan 26, 2024 at 6:16 AM kumar gaurav <kg2...@gmail.com> wrote: > >> > >>> HI Everyone > >>> > >>> I am using vector search in Solr 9.4. I am using cosine similarity with > >> knn > >>> parser. > >>> > >>> Same as the documentation > >>> > >>> > >> > https://solr.apache.org/guide/solr/latest/query-guide/dense-vector-search.html > >>> Schema > >>> <fieldType name="knn_vector" class="solr.DenseVectorField" > >>> vectorDimension="768" similarityFunction="cosine"/> > >>> <field name="vector" type="knn_vector" indexed="true" stored="true"/> > >>> > >>> Query > >>> q={!knn f=vector topK=10}[1.0, 2.0, 3.0, 4.0] > >>> > >>> The problem is it always returns docs even if it's not relevant. Even > if > >> I > >>> am using the xyz keyword, knn parser is returning the documents which > is > >>> useless. I want to control the similarity of documents. I need highly > >>> similar documents only. Does Solr have any parameter in the knn parser > >>> which controls the similarity threshold ? > >>> > >>> *How can I control the minimum Similarity threshold with knn parser ?* > >>> > >>> Please help. Thanks in advance. > >>> > >>> > >>> -- > >>> Thanks & Regards > >>> Kumar Gaurav > >>> > -- > Charlie Hull - Managing Consultant at OpenSource Connections Limited > Founding member of The Search Network and co-author of Searching the > Enterprise > tel/fax: +44 (0)8700 118334 > mobile: +44 (0)7767 825828 > > OpenSource Connections Europe GmbH | Pappelallee 78/79 | 10437 Berlin > Amtsgericht Charlottenburg | HRB 230712 B > Geschäftsführer: John M. Woodell | David E. Pugh > Finanzamt: Berlin Finanzamt für Körperschaften II > >