HI Charlie and Alessandro Thank you very much for replying. It is very helpful.
Both of your links are very useful. I am very grateful to you both for this. Both of you are suggesting hybrid search. Alessandro I have read this https://sease.io/2023/12/hybrid-search-with-apache-solr.html link already and experimented. I am getting keyword search results + vector search results in a query. I am a very big fan of your work and always follow https://sease.io tutorials regarding Solr nural search. I have some questions and clarification 1. I am totally getting your point that scores are generated dynamically so I am not sure how you will implement the cutoff feature I used "fq":"{!frange l=0.4}query($q,0)" to remove docs less than score 0.4 which works but it will not be helpful because scores are generated dynamically for each request. 2. I also want to ask you one question regarding taxonomy vector generation. In the context of ecommerce data, Do you recommend putting all field data into one sentence for vector fields or should use main fields like product name and category only? 3. Regarding vector generation, which open source model do you recommend ? I have used BERT which is not correct in some cases. Thanking you with my full heart. Will wait for your answers. Thanks & regards Kumar Gaurav On Fri, 26 Jan 2024 at 23:46, Alessandro Benedetti <a.benede...@sease.io> wrote: > Hi Kumar, > Knn search in Apache Solr doesn't support any min-threshold parameter. > To be honest, even if it did, you wouldn't be in a much better position: > your perceived relevance won't necessarily match the 0-1 cosine similarity > between your query and your vectors, and what you consider highly relevant > may have a score of 0.35 for one query and 0.96 for another. > Having such a parameter just delegates to the user the pain of setting up a > useful threshold, which, trust me, it's not an easy (or maybe doable) job. > > It's on my roadmap to add a sort of auto-cutting functionality based on the > document score and Lucene also added a threshold-based search (which we may > or may not port to Apache Solr). > In the meantime, you can play with Hybrid Search (which will also be > improved in the future): > https://sease.io/2023/12/hybrid-search-with-apache-solr.html > > Cheers > > -------------------------- > *Alessandro Benedetti* > Director @ Sease Ltd. > *Apache Lucene/Solr Committer* > *Apache Solr PMC Member* > > e-mail: a.benede...@sease.io > > > *Sease* - Information Retrieval Applied > Consulting | Training | Open Source > > Website: Sease.io <http://sease.io/> > LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter > <https://twitter.com/seaseltd> | Youtube > <https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github > <https://github.com/seaseltd> > > > On Fri, 26 Jan 2024 at 17:01, Charlie Hull < > ch...@opensourceconnections.com> > wrote: > > > Hi Kumar, > > > > kNN will return the k closest vectors, which as you've found out may not > > be very close at all. Most of the approaches we're seeing as we work > > with e-commerce clients involve combining kNN with a standard, lexical > > search in some way - combining the results from both, or using one to > > boost certain results. You might find this blog useful as it discusses > > some strategies for coping with what you've found > > > > > https://opensourceconnections.com/blog/2023/03/22/building-vector-search-in-chorus-a-technical-deep-dive/ > > > > best > > > > Charlie > > > > > > On 26/01/2024 12:18, kumar gaurav wrote: > > > HI Srijan > > > > > > Thanks for replying. > > > > > > I am using the BERT open source model to generate vectors. Are you > aware > > of > > > any minSimilary parameter threshold in knn parser ? > > > > > > I am working with an ecommerce dataset. So I am getting the same non > > > relevant results and the same score if I am using any invalid search > > token > > > which is not present in my index. > > > > > > I want to apply some kind of minimum similarity threshold so I can > > > throw out the outliers and can get very nearest documents only. > > > > > > > > > > > > On Fri, 26 Jan 2024 at 17:05, Srijan <shree...@gmail.com> wrote: > > > > > >> I have been testing dense vector search on Solr and it's been working > > great > > >> for me so far. Mine is an image search use case using OpenAI's CLIP > > model > > >> but the configurations are pretty much the same as yours. What > embedding > > >> model are you using? And can you share a portion of the actual query? > > >> > > >> On Fri, Jan 26, 2024 at 6:16 AM kumar gaurav <kg2...@gmail.com> > wrote: > > >> > > >>> HI Everyone > > >>> > > >>> I am using vector search in Solr 9.4. I am using cosine similarity > with > > >> knn > > >>> parser. > > >>> > > >>> Same as the documentation > > >>> > > >>> > > >> > > > https://solr.apache.org/guide/solr/latest/query-guide/dense-vector-search.html > > >>> Schema > > >>> <fieldType name="knn_vector" class="solr.DenseVectorField" > > >>> vectorDimension="768" similarityFunction="cosine"/> > > >>> <field name="vector" type="knn_vector" indexed="true" stored="true"/> > > >>> > > >>> Query > > >>> q={!knn f=vector topK=10}[1.0, 2.0, 3.0, 4.0] > > >>> > > >>> The problem is it always returns docs even if it's not relevant. Even > > if > > >> I > > >>> am using the xyz keyword, knn parser is returning the documents which > > is > > >>> useless. I want to control the similarity of documents. I need highly > > >>> similar documents only. Does Solr have any parameter in the knn > parser > > >>> which controls the similarity threshold ? > > >>> > > >>> *How can I control the minimum Similarity threshold with knn parser > ?* > > >>> > > >>> Please help. Thanks in advance. > > >>> > > >>> > > >>> -- > > >>> Thanks & Regards > > >>> Kumar Gaurav > > >>> > > -- > > Charlie Hull - Managing Consultant at OpenSource Connections Limited > > Founding member of The Search Network and co-author of Searching the > > Enterprise > > tel/fax: +44 (0)8700 118334 > > mobile: +44 (0)7767 825828 > > > > OpenSource Connections Europe GmbH | Pappelallee 78/79 | 10437 Berlin > > Amtsgericht Charlottenburg | HRB 230712 B > > Geschäftsführer: John M. Woodell | David E. Pugh > > Finanzamt: Berlin Finanzamt für Körperschaften II > > > > >