[ https://issues.apache.org/jira/browse/SOLR-15449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17355811#comment-17355811 ]
Alessandro Benedetti commented on SOLR-15449: --------------------------------------------- I agree [~dsmiley], I changed slightly: "in multi-field search where the text analysis per field produces a different amount of tokens: sow=true causes the minimum should match to be "per document" i.e a document to be a match must contain all the mm query terms anywhere at least once sow=false causes the minimum should match to be "per field" i.e a document to be a match must contain all the mm query terms in a single field at least once" better now? > edimax sow causes issues with minimum should match in case of multi field > with different analysis > ------------------------------------------------------------------------------------------------- > > Key: SOLR-15449 > URL: https://issues.apache.org/jira/browse/SOLR-15449 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Affects Versions: 8.8.2 > Reporter: Alessandro Benedetti > Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > h1. Intro > in multi-field search where the text analysis per field produces a different > amount of tokens: > sow=true causes the minimum should match to be "per document" > i.e a document to be a match must contain all the mm query terms anywhere at > least once > sow=false causes the minimum should match to be "per field" > i.e a document to be a match must contain all the mm query terms in a single > field at least once > When the query parsed moves from being term centric(sow=true) to field > centric(sow=false and different text analysis), mm means two different things: > {code:java} > sow = true > mm=2 > qf = author subjects_as_same_term > q = united kingdom > defType = edismax > "parsedquery_toString": > "+(((author:united | subjects_as_same_term:united) (author:kingdom | > subjects_as_same_term:kingdom))~2)" > {code} > {code:java} > "response":{"numFound":2,"start":0,"maxScore":7.757958,"numFoundExact":true,"docs":[ > { > "id":"888888", > "author":"united", > "subjects":["kingdom"], > "score":7.757958}, > { > "id":"77777", > "author":"united kingdom", > "score":5.874222}] > }, > {code} > mimimum of query terms matched within the same field (i.e. all query terms > required must be found in one of the fields) > “PER FIELD” > {code:java} > sow = false > mm=2 > qf = author subjects_as_same_term > q = united kingdom > defType = edismax > "parsedquery_toString": > "+(((author:united author:kingdom)~2) | > (((subjects_as_same_term:uk subjects_as_same_term:"united kingdom" > subjects_as_same_term:england subjects_as_same_term:london > subjects_as_same_term:british subjects_as_same_term:britain))~1))" > {code} > This (author:united author:kingdom)~2 means we need both the clauses to match > to have a good candidate, in disjunction with > (subjects_as_same_term:uk subjects_as_same_term:”united kingdom” > subjects_as_same_term:england subjects_as_same_term:london > subjects_as_same_term:british subjects_as_same_term:britain))~1 that means we > need at least one clause to match (because synonyms expanded the two original > terms into a single one) > {code:java} > "response":{"numFound":1,"start":0,"maxScore":5.874222,"numFoundExact":true,"docs":[ > { > "id":"77777", > "author":"united kingdom", > "score":5.874222}] > } > {code} > h1. Problem > When a field text analysis is incompatible with the query text, mm is not > fully respected: > {code:java} > sow = false > mm=100% > qf = text numeric_i > q = terminator 100 > defType = edismax > "parsedquery_toString": > "+(((text:terminator text:100)~2) | > (numeric_i:100)~1))" > {code} > A document just containing '100' in the field numeric_i is returned as a good > search result but it actually doesn't respect the mm=100% > Reference: > https://sease.io/2021/05/apache-solr-sow-parameter-split-on-whitespace-and-multi-field-full-text-search.html -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org