Hello All, We have a relatively new Solr Instance: solr-spec: 9.5.0 solr-impl: 9.5.0 cdd27dd15c3a6574032e9b1b92b148ab4e383599 - gerlowskija - 2024-02-07 15:10:39
lucene-spec: 9.9.2 lucene-impl: 9.9.2 a2939784c4ca60bc28bf488b5479c02fc2e5e22c - 2024-01-25 09:51:09 JVM Runtime: Eclipse Adoptium OpenJDK 64-Bit Server VM 17.0.10 17.0.10+7 We run the solr instance in a Kubernetes cluster in gcp. We have two collections but only documents in one of them right now. We have indexed ~70,000 tasks (one of the types of documents we index) on one of the collection. In total there are ~100,000 documents in this collection. Note that on production we still use an older solr version (8.11.2) with ~5,000,000 tasks and the fallowing problem does not appear there. The collection are all set um with the _default config and only use 1 shard each. autoAddReplicas is also configured to be false. The replicationFactor is also 1. Even the maxShardsPerNode is 1. Or at least that's how we configured the collections. In the debugged response you will see that somehow multiple shards are at play. Now the problem: Every Task has a parent id - we call it processId. We use this processId to find all the tasks that belong to one process. By searching for this processId we expect to find all the tasks that belong to the corresponding process. For example, we have a process with the processId 20021454 (this is the real processId, I have chosen to show you the real number, because maybe this number is forbidden in solr?!). One would expect to find all the tasks that belong to this process when using this query: "task_coopProcessId:20021454". We know for a fact that this process contains exactly four tasks. That's also what solr returns - four tasks. But two of the tasks don't belong to the correct process. Below is the response we get from solr (to keep the response short, I have included the fl parameter, to only show the important info for this problem description). I have also included the result when showing debug info as an attachment (example.json). You will need to mentally replace <insert-project-name> with a real project name, that I am not going to name here. { "responseHeader": { "zkConnected": true, "status": 0, "QTime": 9, "params": { "q": "task_coopProcessId:20021454", "indent": "true", "fl": "task_coopProcessId", "q.op": "OR", "useParams": "" } }, "response": { "numFound": 4, "start": 0, "maxScore": 1, "numFoundExact": true, "docs": [ { "task_coopProcessId": 2008387 }, { "task_coopProcessId": 20021454 }, { "task_coopProcessId": 2008403 }, { "task_coopProcessId": 20021454 } ] } } With kind regards, Dario Viva