Points are somewhat specific thing. Why don't start from StrField ? On Mon, Apr 22, 2024 at 6:02 PM <dario.v...@coop.ch> wrote:
> Hello Mikhail, (resent with hopefully better representation of table) > > The Fieldtype is plong. Hopefully this helps us find the problem. > > As I can not send you screenshots I will attempt to send you an ascii > representation of what I see under the Schema endpoint in the web admin > view. > > Field > task_coopProcessId > Type > Plong > > Field: task_coopProcessId > Field-Type:org.apache.solr.schema.LongPointField > > > +------------+---------+--------------+------------+------------------------------------+-------------------+ > | Flags: | Indexed | UnInvertible | Omit Norms | Omit Terms > Frequencies & Positions | Sort Missing Last | > > +------------+---------+--------------+------------+------------------------------------+-------------------+ > | Properties | X | X | X | X > | X | > | Schema | X | X | X | X > | X | > +------------+---------+--------------+------------+------------------------------------+-------------------+ > > (view Text in monospaced font, to not get confused about the table) > > Index Analyzer: > org.apache.solr.schema.FieldType$DefaultAnalyzer > Query Analyzer: > org.apache.solr.schema.FieldType$DefaultAnalyzer > > with kind regards, > > Dario > > > -----Ursprüngliche Nachricht----- > Von: Mikhail Khludnev <m...@apache.org> > Gesendet: Montag, 22. April 12024 11:34 > An: users@solr.apache.org > Betreff: Re: Wrong documents in Response > > > > > "querystring": "task_coopProcessId:20021454", > > "parsedquery": "(task_coopProcessId:[20021454 TO > > 20021454])", > > > What's the field type here? How string was parsed into range? I suppose it > may be just a StrField. > > "task_46916": { > > "match": false, > > "value": 0, > > "description": > > "task_coopProcessId:[20021454 TO 20021454] doesn't match id 30330" > > > Also, I don't know how non-matching docs may appear in the result. > > On Mon, Apr 22, 2024 at 11:16?AM <dario.v...@coop.ch> wrote: > > > Sure, here it is directly in mail. Hopefully it does not get chopped. > > > > { > > "responseHeader": { > > "zkConnected": true, > > "status": 0, > > "QTime": 29, > > "params": { > > "q": "task_coopProcessId:20021454", > > "indent": "true", > > "fl": "task_coopProcessId", > > "q.op": "OR", > > "debug.explain.structured": "true", > > "debugQuery": "true", > > "useParams": "" > > } > > }, > > "response": { > > <same as in response without debug> > > }, > > "debug": { > > "track": { > > "rid": > > > "<insert-project-name>-solrcloud-1.<insert-project-name>-solrcloud-headless.<insert-project-name>-14886", > > "EXECUTE_QUERY": { > > "https:// > <insert-project-name>-solrcloud-1.<insert-project-name>-solrcloud-headless.<insert-project-name>:8983/solr/workflow_shard1_replica_n4/": > > { > > "QTime": "0", > > "ElapsedTime": "11", > > "RequestPurpose": > > "GET_TOP_IDS,SET_TERM_STATS", > > "NumFound": "0", > > "Response": > > "{responseHeader={zkConnected=true, status=0, QTime=0, params={df=_text_, > > distrib=false, debug=[false, timing, track], fl=[id, score], > > shards.purpose=16388, start=0, fsv=true, q.op=OR, rows=10, > > > rid=<insert-project-name>-solrcloud-1.<insert-project-name>-solrcloud-headless.<insert-project-name>-14886, > > debug.explain.structured=true, version=2, q=task_coopProcessId:20021454, > > omitHeader=false, requestPurpose=GET_TOP_IDS,SET_TERM_STATS, > > NOW=1713520858995, isShard=true, wt=javabin, debugQuery=false, > > useParams=}}, > > response={numFound=0,numFoundExact=true,start=0,maxScore=0.0,docs=[]}, > > sort_values={}, debug={timing={time=0.0, prepare={time=0.0, > > query={time=0.0}, facet={time=0.0}, facet_module={time=0.0}, > > mlt={time=0.0}, highlight={time=0.0}, stats={time=0.0}, > expand={time=0.0}, > > terms={time=0.0}, debug={time=0.0}}, process={time=0.0, query={time=0.0}, > > facet={time=0.0}, facet_module={time=0.0}, mlt={time=0.0}, > > highlight={time=0.0}, stats={time=0.0}, expand={time=0.0}, > > terms={time=0.0}, debug={time=0.0}}}}}" > > }, > > "https:// > <insert-project-name>-solrcloud-2.<insert-project-name>-solrcloud-headless.<insert-project-name>:8983/solr/workflow_shard2_replica_n2/": > > { > > "QTime": "0", > > "ElapsedTime": "13", > > "RequestPurpose": > > "GET_TOP_IDS,SET_TERM_STATS", > > "NumFound": "0", > > "Response": > > "{responseHeader={zkConnected=true, status=0, QTime=0, params={df=_text_, > > distrib=false, debug=[false, timing, track], fl=[id, score], > > shards.purpose=16388, start=0, fsv=true, q.op=OR, rows=10, > > > rid=<insert-project-name>-solrcloud-1.<insert-project-name>-solrcloud-headless.<insert-project-name>-14886, > > debug.explain.structured=true, version=2, q=task_coopProcessId:20021454, > > omitHeader=false, requestPurpose=GET_TOP_IDS,SET_TERM_STATS, > > NOW=1713520858995, isShard=true, wt=javabin, debugQuery=false, > > useParams=}}, > > response={numFound=0,numFoundExact=true,start=0,maxScore=0.0,docs=[]}, > > sort_values={}, debug={timing={time=0.0, prepare={time=0.0, > > query={time=0.0}, facet={time=0.0}, facet_module={time=0.0}, > > mlt={time=0.0}, highlight={time=0.0}, stats={time=0.0}, > expand={time=0.0}, > > terms={time=0.0}, debug={time=0.0}}, process={time=0.0, query={time=0.0}, > > facet={time=0.0}, facet_module={time=0.0}, mlt={time=0.0}, > > highlight={time=0.0}, stats={time=0.0}, expand={time=0.0}, > > terms={time=0.0}, debug={time=0.0}}}}}" > > }, > > "https:// > <insert-project-name>-solrcloud-0.<insert-project-name>-solrcloud-headless.<insert-project-name>:8983/solr/workflow_shard3_replica_n1/": > > { > > "QTime": "0", > > "ElapsedTime": "17", > > "RequestPurpose": > > "GET_TOP_IDS,SET_TERM_STATS", > > "NumFound": "4", > > "Response": > > "{responseHeader={zkConnected=true, status=0, QTime=0, params={df=_text_, > > distrib=false, debug=[false, timing, track], fl=[id, score], > > shards.purpose=16388, start=0, fsv=true, q.op=OR, rows=10, > > > rid=<insert-project-name>-solrcloud-1.<insert-project-name>-solrcloud-headless.<insert-project-name>-14886, > > debug.explain.structured=true, version=2, q=task_coopProcessId:20021454, > > omitHeader=false, requestPurpose=GET_TOP_IDS,SET_TERM_STATS, > > NOW=1713520858995, isShard=true, wt=javabin, debugQuery=false, > > useParams=}}, > > > response={numFound=4,numFoundExact=true,start=0,maxScore=1.0,docs=[SolrDocument{id=task_46914, > > score=1.0}, SolrDocument{id=task_46915, score=1.0}, > > SolrDocument{id=task_46916, score=1.0}, SolrDocument{id=task_46917, > > score=1.0}]}, sort_values={}, debug={timing={time=0.0, prepare={time=0.0, > > query={time=0.0}, facet={time=0.0}, facet_module={time=0.0}, > > mlt={time=0.0}, highlight={time=0.0}, stats={time=0.0}, > expand={time=0.0}, > > terms={time=0.0}, debug={time=0.0}}, process={time=0.0, query={time=0.0}, > > facet={time=0.0}, facet_module={time=0.0}, mlt={time=0.0}, > > highlight={time=0.0}, stats={time=0.0}, expand={time=0.0}, > > terms={time=0.0}, debug={time=0.0}}}}}" > > } > > }, > > "GET_FIELDS": { > > "https:// > <insert-project-name>-solrcloud-0.<insert-project-name>-solrcloud-headless.<insert-project-name>:8983/solr/workflow_shard3_replica_n1/": > > { > > "QTime": "1", > > "ElapsedTime": "4", > > "RequestPurpose": > > "GET_FIELDS,GET_DEBUG,SET_TERM_STATS", > > "NumFound": "4", > > "Response": > > "{responseHeader={zkConnected=true, status=0, QTime=1, params={df=_text_, > > distrib=false, debug=[timing, track], fl=[task_coopProcessId, id], > > shards.purpose=16704, q.op=OR, rows=10, > > > rid=<insert-project-name>-solrcloud-1.<insert-project-name>-solrcloud-headless.<insert-project-name>-14886, > > debug.explain.structured=true, version=2, q=task_coopProcessId:20021454, > > omitHeader=false, requestPurpose=GET_FIELDS,GET_DEBUG,SET_TERM_STATS, > > NOW=1713520858995, ids=task_46915,task_46914,task_46917,task_46916, > > isShard=true, wt=javabin, debugQuery=true, useParams=}}, > > > response={numFound=4,numFoundExact=true,start=0,docs=[SolrDocument{task_coopProcessId=20021454}, > > SolrDocument{task_coopProcessId=2008387}, > > SolrDocument{task_coopProcessId=20021454}, > > SolrDocument{task_coopProcessId=2008403}]}, > > debug={rawquerystring=task_coopProcessId:20021454, > > querystring=task_coopProcessId:20021454, > > parsedquery=(task_coopProcessId:[20021454 TO 20021454]), > > parsedquery_toString=task_coopProcessId:[20021454 TO 20021454], > > explain={task_46915={match=true, value=1.0, > > description=task_coopProcessId:[20021454 TO 20021454]}, > > task_46914={match=false, value=0.0, > > description=task_coopProcessId:[20021454 TO 20021454] doesn't match id > > 30378}, task_46917={match=true, value=1.0, > > description=task_coopProcessId:[20021454 TO 20021454]}, > > task_46916={match=false, value=0.0, > > description=task_coopProcessId:[20021454 TO 20021454] doesn't match id > > 30330}}, QParser=LuceneQParser, timing={time=1.0, prepare={time=0.0, > > query={time=0.0}, facet={time=0.0}, facet_module={time=0.0}, > > mlt={time=0.0}, highlight={time=0.0}, stats={time=0.0}, > expand={time=0.0}, > > terms={time=0.0}, debug={time=0.0}}, process={time=0.0, query={time=0.0}, > > facet={time=0.0}, facet_module={time=0.0}, mlt={time=0.0}, > > highlight={time=0.0}, stats={time=0.0}, expand={time=0.0}, > > terms={time=0.0}, debug={time=0.0}}}}}" > > } > > } > > }, > > "timing": { > > "time": 1, > > "prepare": { > > "time": 0, > > "query": { > > "time": 0 > > }, > > "facet": { > > "time": 0 > > }, > > "facet_module": { > > "time": 0 > > }, > > "mlt": { > > "time": 0 > > }, > > "highlight": { > > "time": 0 > > }, > > "stats": { > > "time": 0 > > }, > > "expand": { > > "time": 0 > > }, > > "terms": { > > "time": 0 > > }, > > "debug": { > > "time": 0 > > } > > }, > > "process": { > > "time": 0, > > "query": { > > "time": 0 > > }, > > "facet": { > > "time": 0 > > }, > > "facet_module": { > > "time": 0 > > }, > > "mlt": { > > "time": 0 > > }, > > "highlight": { > > "time": 0 > > }, > > "stats": { > > "time": 0 > > }, > > "expand": { > > "time": 0 > > }, > > "terms": { > > "time": 0 > > }, > > "debug": { > > "time": 0 > > } > > } > > }, > > "rawquerystring": "task_coopProcessId:20021454", > > "querystring": "task_coopProcessId:20021454", > > "parsedquery": "(task_coopProcessId:[20021454 TO > > 20021454])", > > "parsedquery_toString": "task_coopProcessId:[20021454 TO > > 20021454]", > > "QParser": "LuceneQParser", > > "explain": { > > "task_46914": { > > "match": false, > > "value": 0, > > "description": > > "task_coopProcessId:[20021454 TO 20021454] doesn't match id 30378" > > }, > > "task_46915": { > > "match": true, > > "value": 1, > > "description": > > "task_coopProcessId:[20021454 TO 20021454]" > > }, > > "task_46916": { > > "match": false, > > "value": 0, > > "description": > > "task_coopProcessId:[20021454 TO 20021454] doesn't match id 30330" > > }, > > "task_46917": { > > "match": true, > > "value": 1, > > "description": > > "task_coopProcessId:[20021454 TO 20021454]" > > } > > } > > } > > } > > > > > > -----Ursprüngliche Nachricht----- > > Von: Mikhail Khludnev <m...@apache.org> > > Gesendet: Samstag, 20. April 12024 11:36 > > An: users@solr.apache.org > > Cc: solr-u...@lucene.apache.org > > Betreff: Re: Wrong documents in Response > > > > CAUTION: This is an external email from sender 'Mikhail Khludnev < > > m...@apache.org>' ('users-return-164293-Dario.Viva= > coop...@solr.apache.org'). > > Do not click any links or open any attachments unless you trust the > sender > > and know the content is safe. > > > > > > > > Hello Dario. > > Mailing list chopped attachment, but looking into debugQuery is what we > > need here. > > > > On Fri, Apr 19, 2024 at 1:41?PM <dario.v...@coop.ch> wrote: > > > > > Hello All, > > > > > > > > > > > > We have a relatively new Solr Instance: > > > > > > solr-spec: 9.5.0 > > > > > > solr-impl: 9.5.0 cdd27dd15c3a6574032e9b1b92b148ab4e383599 - > gerlowskija - > > > 2024-02-07 15:10:39 > > > > > > > > > > > > lucene-spec: 9.9.2 > > > > > > lucene-impl: 9.9.2 a2939784c4ca60bc28bf488b5479c02fc2e5e22c - > 2024-01-25 > > > 09:51:09 > > > > > > > > > > > > JVM Runtime: Eclipse Adoptium OpenJDK 64-Bit Server VM 17.0.10 > 17.0.10+7 > > > > > > > > > > > > We run the solr instance in a Kubernetes cluster in gcp. > > > > > > > > > > > > We have two collections but only documents in one of them right now. We > > > have indexed ~70,000 tasks (one of the types of documents we index) on > > one > > > of the collection. In total there are ~100,000 documents in this > > > collection. > > > > > > Note that on production we still use an older solr version (8.11.2) > with > > > ~5,000,000 tasks and the fallowing problem does not appear there. > > > > > > > > > > > > The collection are all set um with the _default config and only use 1 > > > shard each. autoAddReplicas is also configured to be false. The > > > replicationFactor is also 1. Even the maxShardsPerNode is 1. > > > > > > Or at least that's how we configured the collections. In the debugged > > > response you will see that somehow multiple shards are at play. > > > > > > > > > > > > Now the problem: > > > > > > Every Task has a parent id - we call it processId. We use this > processId > > > to find all the tasks that belong to one process. > > > > > > By searching for this processId we expect to find all the tasks that > > > belong to the corresponding process. > > > > > > > > > > > > For example, we have a process with the processId 20021454 (this is the > > > real processId, I have chosen to show you the real number, because > maybe > > > this number is forbidden in solr?!). > > > > > > One would expect to find all the tasks that belong to this process when > > > using this query: "task_coopProcessId:20021454". > > > > > > We know for a fact that this process contains exactly four tasks. > That's > > > also what solr returns - four tasks. > > > > > > But two of the tasks don't belong to the correct process. > > > > > > Below is the response we get from solr (to keep the response short, I > > have > > > included the fl parameter, to only show the important info for this > > problem > > > description). > > > > > > I have also included the result when showing debug info as an > attachment > > > (example.json). You will need to mentally replace <insert-project-name> > > > with a real project name, that I am not going to name here. > > > > > > > > > > > > { > > > > > > "responseHeader": { > > > > > > "zkConnected": true, > > > > > > "status": 0, > > > > > > "QTime": 9, > > > > > > "params": { > > > > > > "q": "task_coopProcessId:20021454", > > > > > > "indent": "true", > > > > > > "fl": "task_coopProcessId", > > > > > > "q.op": "OR", > > > > > > "useParams": "" > > > > > > } > > > > > > }, > > > > > > "response": { > > > > > > "numFound": 4, > > > > > > "start": 0, > > > > > > "maxScore": 1, > > > > > > "numFoundExact": true, > > > > > > "docs": [ > > > > > > { > > > > > > "task_coopProcessId": 2008387 > > > > > > }, > > > > > > { > > > > > > "task_coopProcessId": 20021454 > > > > > > }, > > > > > > { > > > > > > "task_coopProcessId": 2008403 > > > > > > }, > > > > > > { > > > > > > "task_coopProcessId": 20021454 > > > > > > } > > > > > > ] > > > > > > } > > > > > > } > > > > > > > > > > > > With kind regards, > > > > > > > > > > > > Dario Viva > > > > > > > > > > > > > > > > > > > > > -- > > Sincerely yours > > Mikhail Khludnev > > > > > -- > Sincerely yours > Mikhail Khludnev > -- Sincerely yours Mikhail Khludnev