Hi Kumar,

kNN will return the k closest vectors, which as you've found out may not be very close at all. Most of the approaches we're seeing as we work with e-commerce clients involve combining kNN with a standard, lexical search in some way - combining the results from both, or using one to boost certain results. You might find this blog useful as it discusses some strategies for coping with what you've found https://opensourceconnections.com/blog/2023/03/22/building-vector-search-in-chorus-a-technical-deep-dive/

best

Charlie


On 26/01/2024 12:18, kumar gaurav wrote:
HI Srijan

Thanks for replying.

I am using the BERT open source model to generate vectors. Are you aware of
any minSimilary parameter threshold in knn parser ?

I am working with an ecommerce dataset. So I am getting the same non
relevant results and the same score if I am using any invalid search token
which is not present in my index.

I want to apply some kind of minimum similarity threshold so I can
throw out the outliers and can get very nearest documents only.



On Fri, 26 Jan 2024 at 17:05, Srijan <shree...@gmail.com> wrote:

I have been testing dense vector search on Solr and it's been working great
for me so far. Mine is an image search use case using OpenAI's CLIP model
but the configurations are pretty much the same as yours. What embedding
model are you using? And can you share a portion of the actual query?

On Fri, Jan 26, 2024 at 6:16 AM kumar gaurav <kg2...@gmail.com> wrote:

HI Everyone

I am using vector search in Solr 9.4. I am using cosine similarity with
knn
parser.

Same as the documentation


https://solr.apache.org/guide/solr/latest/query-guide/dense-vector-search.html
Schema
<fieldType name="knn_vector" class="solr.DenseVectorField"
vectorDimension="768" similarityFunction="cosine"/>
<field name="vector" type="knn_vector" indexed="true" stored="true"/>

Query
q={!knn f=vector topK=10}[1.0, 2.0, 3.0, 4.0]

The problem is it always returns docs even if it's not relevant. Even if
I
am using the xyz keyword, knn parser is returning the documents which is
useless. I want to control the similarity of documents. I need highly
similar documents only. Does Solr have any parameter in the knn parser
which controls the similarity threshold ?

*How can I control the minimum Similarity threshold with knn parser ?*

Please help. Thanks in advance.


--
Thanks & Regards
Kumar Gaurav

--
Charlie Hull - Managing Consultant at OpenSource Connections Limited
Founding member of The Search Network and co-author of Searching the Enterprise
tel/fax: +44 (0)8700 118334
mobile: +44 (0)7767 825828

OpenSource Connections Europe GmbH | Pappelallee 78/79 | 10437 Berlin
Amtsgericht Charlottenburg | HRB 230712 B
Geschäftsführer: John M. Woodell | David E. Pugh
Finanzamt: Berlin Finanzamt für Körperschaften II

Reply via email to