Hi Kumar,
Knn search in Apache Solr doesn't support any min-threshold parameter.
To be honest, even if it did, you wouldn't be in a much better position:
your perceived relevance won't necessarily match the 0-1 cosine similarity
between your query and your vectors, and what you consider highly relevant
may have a score of 0.35 for one query and 0.96 for another.
Having such a parameter just delegates to the user the pain of setting up a
useful threshold, which, trust me, it's not an easy (or maybe doable) job.

It's on my roadmap to add a sort of auto-cutting functionality based on the
document score and Lucene also added a threshold-based search (which we may
or may not port to Apache Solr).
In the meantime, you can play with Hybrid Search (which will also be
improved in the future):
https://sease.io/2023/12/hybrid-search-with-apache-solr.html

Cheers

--------------------------
*Alessandro Benedetti*
Director @ Sease Ltd.
*Apache Lucene/Solr Committer*
*Apache Solr PMC Member*

e-mail: a.benede...@sease.io


*Sease* - Information Retrieval Applied
Consulting | Training | Open Source

Website: Sease.io <http://sease.io/>
LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter
<https://twitter.com/seaseltd> | Youtube
<https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github
<https://github.com/seaseltd>


On Fri, 26 Jan 2024 at 17:01, Charlie Hull <ch...@opensourceconnections.com>
wrote:

> Hi Kumar,
>
> kNN will return the k closest vectors, which as you've found out may not
> be very close at all. Most of the approaches we're seeing as we work
> with e-commerce clients involve combining kNN with a standard, lexical
> search in some way - combining the results from both, or using one to
> boost certain results. You might find this blog useful as it discusses
> some strategies for coping with what you've found
>
> https://opensourceconnections.com/blog/2023/03/22/building-vector-search-in-chorus-a-technical-deep-dive/
>
> best
>
> Charlie
>
>
> On 26/01/2024 12:18, kumar gaurav wrote:
> > HI Srijan
> >
> > Thanks for replying.
> >
> > I am using the BERT open source model to generate vectors. Are you aware
> of
> > any minSimilary parameter threshold in knn parser ?
> >
> > I am working with an ecommerce dataset. So I am getting the same non
> > relevant results and the same score if I am using any invalid search
> token
> > which is not present in my index.
> >
> > I want to apply some kind of minimum similarity threshold so I can
> > throw out the outliers and can get very nearest documents only.
> >
> >
> >
> > On Fri, 26 Jan 2024 at 17:05, Srijan <shree...@gmail.com> wrote:
> >
> >> I have been testing dense vector search on Solr and it's been working
> great
> >> for me so far. Mine is an image search use case using OpenAI's CLIP
> model
> >> but the configurations are pretty much the same as yours. What embedding
> >> model are you using? And can you share a portion of the actual query?
> >>
> >> On Fri, Jan 26, 2024 at 6:16 AM kumar gaurav <kg2...@gmail.com> wrote:
> >>
> >>> HI Everyone
> >>>
> >>> I am using vector search in Solr 9.4. I am using cosine similarity with
> >> knn
> >>> parser.
> >>>
> >>> Same as the documentation
> >>>
> >>>
> >>
> https://solr.apache.org/guide/solr/latest/query-guide/dense-vector-search.html
> >>> Schema
> >>> <fieldType name="knn_vector" class="solr.DenseVectorField"
> >>> vectorDimension="768" similarityFunction="cosine"/>
> >>> <field name="vector" type="knn_vector" indexed="true" stored="true"/>
> >>>
> >>> Query
> >>> q={!knn f=vector topK=10}[1.0, 2.0, 3.0, 4.0]
> >>>
> >>> The problem is it always returns docs even if it's not relevant. Even
> if
> >> I
> >>> am using the xyz keyword, knn parser is returning the documents which
> is
> >>> useless. I want to control the similarity of documents. I need highly
> >>> similar documents only. Does Solr have any parameter in the knn parser
> >>> which controls the similarity threshold ?
> >>>
> >>> *How can I control the minimum Similarity threshold with knn parser ?*
> >>>
> >>> Please help. Thanks in advance.
> >>>
> >>>
> >>> --
> >>> Thanks & Regards
> >>> Kumar Gaurav
> >>>
> --
> Charlie Hull - Managing Consultant at OpenSource Connections Limited
> Founding member of The Search Network and co-author of Searching the
> Enterprise
> tel/fax: +44 (0)8700 118334
> mobile: +44 (0)7767 825828
>
> OpenSource Connections Europe GmbH | Pappelallee 78/79 | 10437 Berlin
> Amtsgericht Charlottenburg | HRB 230712 B
> Geschäftsführer: John M. Woodell | David E. Pugh
> Finanzamt: Berlin Finanzamt für Körperschaften II
>
>

Reply via email to