Re: Zk big files issues and model store

2023-10-17 Thread Dmitri Maziuk
On 10/17/23 13:20, Walter Underwood wrote: Gzipping the JSON can be a big win, especially if there are lots of repeated keys, like in state.json. Gzip has the advantage that some editors can natively unpack it. It may save you some transfer time, provided the transport subsystem doesn't com

Solr 9.3.0: NumberRangePrefixTree error

2023-10-17 Thread Scott Vanderbilt
Hello. I posted the message below to this list back on 9 September, but it didn't seem to elicit a response. Trying again in the hopes someone can lend some assistance, for which I would be most grateful. Thanks

Re: Zk big files issues and model store

2023-10-17 Thread Walter Underwood
Apache Avro is a JSON-equivalent binary format. That would be smaller. Looking around the web, it might be 2X to 4X smaller. Gzipping the JSON can be a big win, especially if there are lots of repeated keys, like in state.json. Gzip has the advantage that some editors can natively unpack it. T

Re: Zk big files issues and model store

2023-10-17 Thread Christine Poerschke (BLOOMBERG/ LONDON/ V)
Hi Florin and Matthias, Thanks for sharing about this! Looking into where the JSON indentation in storage comes from -- from code reading only -- I think this is the code trail: * https://github.com/apache/solr/blob/releases/solr/9.4.0/solr/modules/ltr/src/java/org/apache/solr/ltr/store/rest/M

Re: knn query parser, number of results and filtering by score

2023-10-17 Thread Alessandro Benedetti
b) The knn query results are the approximate nearest neighbors, but they might not be the best. We'd like to define some kind of cut-off value for knn document scores. Is this possible, and what would be a good day to do so? Implement a post-processing filter query with an frange on the score field

Re: knn query parser, number of results and filtering by score

2023-10-17 Thread Alessandro Benedetti
Hi Mirko, the topK is per shard. Then shards * k results are aggregated. Does it make sense? In regards to the debugging, it seems a bug, they all should be with a score and within top-k -- *Alessandro Benedetti* Director @ Sease Ltd. *Apache Lucene/Solr Committer* *Apache

Re: knn query parser, number of results and filtering by score

2023-10-17 Thread Mirko Sertic
Hey! Thank you for your help! We are running in cloud mode on GKE. Our index has 2 shards, and every shard has 2 replicas. The leader is a TLOG, the other replica is a PULL. Our main query is basically {!knn f=VECTOR_FIELD topK=10}[VECTOR DATA]. Thats it. I am really unsure how to debug th

Re: knn query parser, number of results and filtering by score

2023-10-17 Thread Alessandro Benedetti
What's your full Solr query? Are you on SolrCloud or single Solr node? -- *Alessandro Benedetti* Director @ Sease Ltd. *Apache Lucene/Solr Committer* *Apache Solr PMC Member* e-mail: a.benede...@sease.io *Sease* - Information Retrieval Applied Consulting | Training | Open

Re: knn query parser, number of results and filtering by score

2023-10-17 Thread Mirko Sertic
To correct me, there was a typo. I meant: If I specify topK=6, I get numFound=12, but only some of them match the top 6 Am 17.10.2023 um 09:31 schrieb Mirko Sertic: Hi! To keep you updated, here are some observations regarding the numFound/resultset size and DenseVectorQueries: If I specity

Re: knn query parser, number of results and filtering by score

2023-10-17 Thread Mirko Sertic
Hi! To keep you updated, here are some observations regarding the numFound/resultset size and DenseVectorQueries: If I specity topK=10, I get numFound=20, but only some of them match the top 10 If I specify topK=8, I get numFound=16, but only some of them match the top 8 If I specify topK=6, I