Re: knn query parser, number of results and filtering by score

Alessandro Benedetti Tue, 17 Oct 2023 01:06:17 -0700

What's your full Solr query?
Are you on SolrCloud or single Solr node?
--------------------------
*Alessandro Benedetti*
Director @ Sease Ltd.
*Apache Lucene/Solr Committer*
*Apache Solr PMC Member*


e-mail: a.benede...@sease.io


*Sease* - Information Retrieval Applied
Consulting | Training | Open Source

Website: Sease.io <http://sease.io/>
LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter
<https://twitter.com/seaseltd> | Youtube
<https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github
<https://github.com/seaseltd>


On Tue, 17 Oct 2023 at 09:45, Mirko Sertic <mirko.ser...@web.de> wrote:

> To correct me, there was a typo. I meant:
>
> If I specify topK=6, I get numFound=12, but only some of them match the
> top 6
>
>
> Am 17.10.2023 um 09:31 schrieb Mirko Sertic:
> > Hi!
> >
> > To keep you updated, here are some observations regarding the
> > numFound/resultset size and DenseVectorQueries:
> >
> > If I specity topK=10, I get numFound=20, but only some of them match the
> > top 10
> >
> > If I specify topK=8, I get numFound=16, but only some of them match the
> > top 8
> >
> > If I specify topK=6, I get numFound=8, but only some of them match the
> > top 6
> >
> > So the numFound seems always to be the double of topK. Might there be a
> > correlation with sharding? Our collection has two shards, so does this
> > double the results? I would't expect that, but that might be the only
> > thing relating to a constant 2 in our setup.
> >
> > Mirko
> >
> >
> > Am 16.10.2023 um 14:46 schrieb Mirko Sertic:
> >> Hi@all
> >>
> >> We are using Solr 9.1.1, and are trying usecases with DenseVector Fields
> >> and knnQueries in mind.
> >>
> >> During our tests, we see the following results and are trying to figure
> >> out what is going on:
> >>
> >> a) We use the following main query : {!knn f=VECTOR_FIELD
> >> topK=10}[VECTOR DATA]. We use it as a main query because we want to
> >> apply the distance function to the document score. However, when I try
> >> do debug and explain the search results, I am getting more than topK=10
> >> result documents, some are marked as match = true with "within top 10",
> >> others are marked as match = false with "not in top 10". I'd expect that
> >> only matched documents are part of the search result, but there are 20
> >> result documents, but only 5 of them are matched. Did I miss something?
> >>
> >> b) The knn query results are the approximate nearest neighbors, but they
> >> might not be the best. We'd like to define some kind of cut-off value
> >> for knn document scores. Is this possible, and what would be a good day
> >> to do so? Implement a post-processing filter query with an frange on the
> >> score field?
> >>
> >> Thank you all,
> >>
> >> Mirko
> >>
>

Re: knn query parser, number of results and filtering by score

Reply via email to