Re: Solr 9.4 - Help regarding vector search min Similarity threshold with knn parser

2024-01-28 Thread kumar gaurav
HI Charlie and Alessandro

Thank you very much for replying. It is very helpful.

Both of your links are very useful. I am very grateful to you both for
this. Both of you are suggesting hybrid search.

Alessandro
I have read this
https://sease.io/2023/12/hybrid-search-with-apache-solr.html link already
and experimented. I am getting keyword search results + vector search
results in a query.
I am a very big fan of your work and always follow https://sease.io
tutorials regarding Solr nural search.

I have some questions and clarification

1. I am totally getting your point that scores are generated dynamically so
I am not sure how you will implement the cutoff feature
I used "fq":"{!frange l=0.4}query($q,0)" to remove docs less than score 0.4
which works but it will not be helpful because scores are generated
dynamically for each request.

2. I also want to ask you one question regarding taxonomy vector generation.
In the context of ecommerce data, Do you recommend putting all field data
into one sentence for vector fields or should use main fields like product
name and category only?

3. Regarding vector generation, which open source model do you recommend ?
I have used BERT which is not correct in some cases.

Thanking you with my full heart. Will wait for your answers.


Thanks & regards
Kumar Gaurav


On Fri, 26 Jan 2024 at 23:46, Alessandro Benedetti 
wrote:

> Hi Kumar,
> Knn search in Apache Solr doesn't support any min-threshold parameter.
> To be honest, even if it did, you wouldn't be in a much better position:
> your perceived relevance won't necessarily match the 0-1 cosine similarity
> between your query and your vectors, and what you consider highly relevant
> may have a score of 0.35 for one query and 0.96 for another.
> Having such a parameter just delegates to the user the pain of setting up a
> useful threshold, which, trust me, it's not an easy (or maybe doable) job.
>
> It's on my roadmap to add a sort of auto-cutting functionality based on the
> document score and Lucene also added a threshold-based search (which we may
> or may not port to Apache Solr).
> In the meantime, you can play with Hybrid Search (which will also be
> improved in the future):
> https://sease.io/2023/12/hybrid-search-with-apache-solr.html
>
> Cheers
>
> --
> *Alessandro Benedetti*
> Director @ Sease Ltd.
> *Apache Lucene/Solr Committer*
> *Apache Solr PMC Member*
>
> e-mail: a.benede...@sease.io
>
>
> *Sease* - Information Retrieval Applied
> Consulting | Training | Open Source
>
> Website: Sease.io 
> LinkedIn  | Twitter
>  | Youtube
>  | Github
> 
>
>
> On Fri, 26 Jan 2024 at 17:01, Charlie Hull <
> ch...@opensourceconnections.com>
> wrote:
>
> > Hi Kumar,
> >
> > kNN will return the k closest vectors, which as you've found out may not
> > be very close at all. Most of the approaches we're seeing as we work
> > with e-commerce clients involve combining kNN with a standard, lexical
> > search in some way - combining the results from both, or using one to
> > boost certain results. You might find this blog useful as it discusses
> > some strategies for coping with what you've found
> >
> >
> https://opensourceconnections.com/blog/2023/03/22/building-vector-search-in-chorus-a-technical-deep-dive/
> >
> > best
> >
> > Charlie
> >
> >
> > On 26/01/2024 12:18, kumar gaurav wrote:
> > > HI Srijan
> > >
> > > Thanks for replying.
> > >
> > > I am using the BERT open source model to generate vectors. Are you
> aware
> > of
> > > any minSimilary parameter threshold in knn parser ?
> > >
> > > I am working with an ecommerce dataset. So I am getting the same non
> > > relevant results and the same score if I am using any invalid search
> > token
> > > which is not present in my index.
> > >
> > > I want to apply some kind of minimum similarity threshold so I can
> > > throw out the outliers and can get very nearest documents only.
> > >
> > >
> > >
> > > On Fri, 26 Jan 2024 at 17:05, Srijan  wrote:
> > >
> > >> I have been testing dense vector search on Solr and it's been working
> > great
> > >> for me so far. Mine is an image search use case using OpenAI's CLIP
> > model
> > >> but the configurations are pretty much the same as yours. What
> embedding
> > >> model are you using? And can you share a portion of the actual query?
> > >>
> > >> On Fri, Jan 26, 2024 at 6:16 AM kumar gaurav 
> wrote:
> > >>
> > >>> HI Everyone
> > >>>
> > >>> I am using vector search in Solr 9.4. I am using cosine similarity
> with
> > >> knn
> > >>> parser.
> > >>>
> > >>> Same as the documentation
> > >>>
> > >>>
> > >>
> >
> https://solr.apache.org/guide/solr/latest/query-guide/dense-vector-search.html
> > >>> Schema
> > >>>  > >>> vectorDimension="768" similarityFunction="cosine"/>
> > >>> 
> > >>>
> > >>> Query
> > >>> q={!knn f=vector topK=10}

Re: LTR model upload API issue

2024-01-28 Thread rajani m
Thank you Ishan, here is the jira - SOLR-17132
. Could you point me to
the API built for distributed management of package files?

On Sat, Jan 27, 2024 at 12:40 PM Ishan Chattopadhyaya <
ichattopadhy...@gmail.com> wrote:

> Hi Rajani,
> I think the LTR models could take advantage of the File store APIs that was
> built for distributed management of package files. If you file a JIRA for
> it, someone can pick it up and work on it.
> Thanks and regards,
> Ishan
>
> On Sat, 27 Jan, 2024, 11:03 pm rajani m,  wrote:
>
> > Hi All,
> >
> >Similar to any schema APIs, I expected the LTR model upload endpoint
> to
> > distribute and make the model available across all the nodes, however it
> > does not. After upload, it continues to report "model not found
> > exception".  The model becomes available only after a collection "reload"
> > api is requested. Have you experienced this?
> >
> > Thanks,
> > Rajani
> >
>