[ https://issues.apache.org/jira/browse/SOLR-15449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alessandro Benedetti updated SOLR-15449: ---------------------------------------- Description: h1. Intro in multi-field search with different analysis per field sow=true implies the minimum should match is "per document" i.e a document to be a match must contain all the mm query terms anywhere at least once sow=false implies the minimum should match is "per field" i.e a document to be a match must contain all the mm query terms in a single field at least once When the query parsed moves from being term centric(sow=true) to field centric(sow=false and different text analysis), mm means two different things: {code:java} sow = true mm=2 qf = author subjects_as_same_term q = united kingdom defType = edismax "parsedquery_toString": "+(((author:united | subjects_as_same_term:united) (author:kingdom | subjects_as_same_term:kingdom))~2)" {code} {code:java} "response":{"numFound":2,"start":0,"maxScore":7.757958,"numFoundExact":true,"docs":[ { "id":"888888", "author":"united", "subjects":["kingdom"], "score":7.757958}, { "id":"77777", "author":"united kingdom", "score":5.874222}] }, {code} mimimum of query terms matched within the same field (i.e. all query terms required must be found in one of the fields) “PER FIELD” {code:java} sow = false mm=2 qf = author subjects_as_same_term q = united kingdom defType = edismax "parsedquery_toString": "+(((author:united author:kingdom)~2) | (((subjects_as_same_term:uk subjects_as_same_term:"united kingdom" subjects_as_same_term:england subjects_as_same_term:london subjects_as_same_term:british subjects_as_same_term:britain))~1))" {code} This (author:united author:kingdom)~2 means we need both the clauses to match to have a good candidate, in disjunction with (subjects_as_same_term:uk subjects_as_same_term:”united kingdom” subjects_as_same_term:england subjects_as_same_term:london subjects_as_same_term:british subjects_as_same_term:britain))~1 that means we need at least one clause to match (because synonyms expanded the two original terms into a single one) {code:java} "response":{"numFound":1,"start":0,"maxScore":5.874222,"numFoundExact":true,"docs":[ { "id":"77777", "author":"united kingdom", "score":5.874222}] } {code} h1. Problem When a field text analysis is incompatible with the query text, mm is not fully respected: {code:java} sow = false mm=100% qf = text numeric_i q = terminator 100 defType = edismax "parsedquery_toString": "+(((text:terminator text:100)~2) | (numeric_i:100)~1))" {code} A document just containing '100' in the field numeric_i is returned as a good search result but it actually doesn't respect the mm=100% was: in multi-field search with different analysis per field sow=true implies the minimum should match is "per document" i.e a document to be a match must contain all the mm query terms anywhere at least once sow=false implies the minimum should match is "per field" i.e a document to be a match must contain all the mm query terms in a single field at least once When the query parsed moves from being term centric(sow=true) to field centric(sow=false and different text analysis), mm means two different things: {code:java} sow = true mm=2 qf = author subjects_as_same_term q = united kingdom defType = edismax "parsedquery_toString": "+(((author:united | subjects_as_same_term:united) (author:kingdom | subjects_as_same_term:kingdom))~2)" {code} {code:java} "response":{"numFound":2,"start":0,"maxScore":7.757958,"numFoundExact":true,"docs":[ { "id":"888888", "author":"united", "subjects":["kingdom"], "score":7.757958}, { "id":"77777", "author":"united kingdom", "score":5.874222}] }, {code} mimimum of query terms matched within the same field (i.e. all query terms required must be found in one of the fields) “PER FIELD” {code:java} sow = false mm=2 qf = author subjects_as_same_term q = united kingdom defType = edismax "parsedquery_toString": "+(((author:united author:kingdom)~2) | (((subjects_as_same_term:uk subjects_as_same_term:"united kingdom" subjects_as_same_term:england subjects_as_same_term:london subjects_as_same_term:british subjects_as_same_term:britain))~1))" {code} This (author:united author:kingdom)~2 means we need both the clauses to match to have a good candidate, in disjunction with (subjects_as_same_term:uk subjects_as_same_term:”united kingdom” subjects_as_same_term:england subjects_as_same_term:london subjects_as_same_term:british subjects_as_same_term:britain))~1 that means we need at least one clause to match (because synonyms expanded the two original terms into a single one) {code:java} "response":{"numFound":1,"start":0,"maxScore":5.874222,"numFoundExact":true,"docs":[ { "id":"77777", "author":"united kingdom", "score":5.874222}] } {code} > edimax sow causes issues with minimum should match in case of multi field > with different analysis > ------------------------------------------------------------------------------------------------- > > Key: SOLR-15449 > URL: https://issues.apache.org/jira/browse/SOLR-15449 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Affects Versions: 8.8.2 > Reporter: Alessandro Benedetti > Priority: Major > > h1. Intro > in multi-field search with different analysis per field > sow=true implies the minimum should match is "per document" > i.e a document to be a match must contain all the mm query terms anywhere at > least once > sow=false implies the minimum should match is "per field" > i.e a document to be a match must contain all the mm query terms in a single > field at least once > When the query parsed moves from being term centric(sow=true) to field > centric(sow=false and different text analysis), mm means two different things: > {code:java} > sow = true > mm=2 > qf = author subjects_as_same_term > q = united kingdom > defType = edismax > "parsedquery_toString": > "+(((author:united | subjects_as_same_term:united) (author:kingdom | > subjects_as_same_term:kingdom))~2)" > {code} > {code:java} > "response":{"numFound":2,"start":0,"maxScore":7.757958,"numFoundExact":true,"docs":[ > { > "id":"888888", > "author":"united", > "subjects":["kingdom"], > "score":7.757958}, > { > "id":"77777", > "author":"united kingdom", > "score":5.874222}] > }, > {code} > mimimum of query terms matched within the same field (i.e. all query terms > required must be found in one of the fields) > “PER FIELD” > {code:java} > sow = false > mm=2 > qf = author subjects_as_same_term > q = united kingdom > defType = edismax > "parsedquery_toString": > "+(((author:united author:kingdom)~2) | > (((subjects_as_same_term:uk subjects_as_same_term:"united kingdom" > subjects_as_same_term:england subjects_as_same_term:london > subjects_as_same_term:british subjects_as_same_term:britain))~1))" > {code} > This (author:united author:kingdom)~2 means we need both the clauses to match > to have a good candidate, in disjunction with > (subjects_as_same_term:uk subjects_as_same_term:”united kingdom” > subjects_as_same_term:england subjects_as_same_term:london > subjects_as_same_term:british subjects_as_same_term:britain))~1 that means we > need at least one clause to match (because synonyms expanded the two original > terms into a single one) > {code:java} > "response":{"numFound":1,"start":0,"maxScore":5.874222,"numFoundExact":true,"docs":[ > { > "id":"77777", > "author":"united kingdom", > "score":5.874222}] > } > {code} > h1. Problem > When a field text analysis is incompatible with the query text, mm is not > fully respected: > {code:java} > sow = false > mm=100% > qf = text numeric_i > q = terminator 100 > defType = edismax > "parsedquery_toString": > "+(((text:terminator text:100)~2) | > (numeric_i:100)~1))" > {code} > A document just containing '100' in the field numeric_i is returned as a good > search result but it actually doesn't respect the mm=100% -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org