Hello Dario. Mailing list chopped attachment, but looking into debugQuery is what we need here.
On Fri, Apr 19, 2024 at 1:41 PM <dario.v...@coop.ch> wrote: > Hello All, > > > > We have a relatively new Solr Instance: > > solr-spec: 9.5.0 > > solr-impl: 9.5.0 cdd27dd15c3a6574032e9b1b92b148ab4e383599 - gerlowskija - > 2024-02-07 15:10:39 > > > > lucene-spec: 9.9.2 > > lucene-impl: 9.9.2 a2939784c4ca60bc28bf488b5479c02fc2e5e22c - 2024-01-25 > 09:51:09 > > > > JVM Runtime: Eclipse Adoptium OpenJDK 64-Bit Server VM 17.0.10 17.0.10+7 > > > > We run the solr instance in a Kubernetes cluster in gcp. > > > > We have two collections but only documents in one of them right now. We > have indexed ~70,000 tasks (one of the types of documents we index) on one > of the collection. In total there are ~100,000 documents in this > collection. > > Note that on production we still use an older solr version (8.11.2) with > ~5,000,000 tasks and the fallowing problem does not appear there. > > > > The collection are all set um with the _default config and only use 1 > shard each. autoAddReplicas is also configured to be false. The > replicationFactor is also 1. Even the maxShardsPerNode is 1. > > Or at least that’s how we configured the collections. In the debugged > response you will see that somehow multiple shards are at play. > > > > Now the problem: > > Every Task has a parent id – we call it processId. We use this processId > to find all the tasks that belong to one process. > > By searching for this processId we expect to find all the tasks that > belong to the corresponding process. > > > > For example, we have a process with the processId 20021454 (this is the > real processId, I have chosen to show you the real number, because maybe > this number is forbidden in solr?!). > > One would expect to find all the tasks that belong to this process when > using this query: “task_coopProcessId:20021454”. > > We know for a fact that this process contains exactly four tasks. That’s > also what solr returns – four tasks. > > But two of the tasks don’t belong to the correct process. > > Below is the response we get from solr (to keep the response short, I have > included the fl parameter, to only show the important info for this problem > description). > > I have also included the result when showing debug info as an attachment > (example.json). You will need to mentally replace <insert-project-name> > with a real project name, that I am not going to name here. > > > > { > > "responseHeader": { > > "zkConnected": true, > > "status": 0, > > "QTime": 9, > > "params": { > > "q": "task_coopProcessId:20021454", > > "indent": "true", > > "fl": "task_coopProcessId", > > "q.op": "OR", > > "useParams": "" > > } > > }, > > "response": { > > "numFound": 4, > > "start": 0, > > "maxScore": 1, > > "numFoundExact": true, > > "docs": [ > > { > > "task_coopProcessId": 2008387 > > }, > > { > > "task_coopProcessId": 20021454 > > }, > > { > > "task_coopProcessId": 2008403 > > }, > > { > > "task_coopProcessId": 20021454 > > } > > ] > > } > > } > > > > With kind regards, > > > > Dario Viva > > > > > -- Sincerely yours Mikhail Khludnev