What's your full Solr query? Are you on SolrCloud or single Solr node? -------------------------- *Alessandro Benedetti* Director @ Sease Ltd. *Apache Lucene/Solr Committer* *Apache Solr PMC Member*
e-mail: a.benede...@sease.io *Sease* - Information Retrieval Applied Consulting | Training | Open Source Website: Sease.io <http://sease.io/> LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter <https://twitter.com/seaseltd> | Youtube <https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github <https://github.com/seaseltd> On Tue, 17 Oct 2023 at 09:45, Mirko Sertic <mirko.ser...@web.de> wrote: > To correct me, there was a typo. I meant: > > If I specify topK=6, I get numFound=12, but only some of them match the > top 6 > > > Am 17.10.2023 um 09:31 schrieb Mirko Sertic: > > Hi! > > > > To keep you updated, here are some observations regarding the > > numFound/resultset size and DenseVectorQueries: > > > > If I specity topK=10, I get numFound=20, but only some of them match the > > top 10 > > > > If I specify topK=8, I get numFound=16, but only some of them match the > > top 8 > > > > If I specify topK=6, I get numFound=8, but only some of them match the > > top 6 > > > > So the numFound seems always to be the double of topK. Might there be a > > correlation with sharding? Our collection has two shards, so does this > > double the results? I would't expect that, but that might be the only > > thing relating to a constant 2 in our setup. > > > > Mirko > > > > > > Am 16.10.2023 um 14:46 schrieb Mirko Sertic: > >> Hi@all > >> > >> We are using Solr 9.1.1, and are trying usecases with DenseVector Fields > >> and knnQueries in mind. > >> > >> During our tests, we see the following results and are trying to figure > >> out what is going on: > >> > >> a) We use the following main query : {!knn f=VECTOR_FIELD > >> topK=10}[VECTOR DATA]. We use it as a main query because we want to > >> apply the distance function to the document score. However, when I try > >> do debug and explain the search results, I am getting more than topK=10 > >> result documents, some are marked as match = true with "within top 10", > >> others are marked as match = false with "not in top 10". I'd expect that > >> only matched documents are part of the search result, but there are 20 > >> result documents, but only 5 of them are matched. Did I miss something? > >> > >> b) The knn query results are the approximate nearest neighbors, but they > >> might not be the best. We'd like to define some kind of cut-off value > >> for knn document scores. Is this possible, and what would be a good day > >> to do so? Implement a post-processing filter query with an frange on the > >> score field? > >> > >> Thank you all, > >> > >> Mirko > >> >