[ https://issues.apache.org/jira/browse/SOLR-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17347410#comment-17347410 ]
Alessandro Benedetti edited comment on SOLR-15407 at 5/19/21, 8:29 AM: ----------------------------------------------------------------------- Hi David, first of all thanks for taking your time to think about this, it is much appreciated. In regards to: {quote}sow=false implies the minimum should match is "per field"{quote} I was thinking the same you think (i.e. sow to not affect mm, and mm to always be "per document"). Then I spent some time investigating to write a dedicated advanced blog (coming out in the next few days) and I verified that currently in 8.8.2 it's not the case. Now, I don't know if it's on purpose or not, but if you have multi-field search, with different analysis per field, this is what you get (I post here a piece of the upcoming blog): In the following examples, one field has synonyms, the other is just white space tokenized. When the query parsed moves from being term centric(sow=true) to field centric(sow=false and different text analysis), mm means two different things: mimimum of query terms matched, independently in which field (PER DOCUMENT) {code:java} sow = true mm=2 qf = author subjects_as_same_term q = united kingdom defType = edismax "parsedquery_toString": "+(((author:united | subjects_as_same_term:united) (author:kingdom | subjects_as_same_term:kingdom))~2)" {code} {code:java} "response":{"numFound":2,"start":0,"maxScore":7.757958,"numFoundExact":true,"docs":[ { "id":"888888", "author":"united", "subjects":["kingdom"], "score":7.757958}, { "id":"77777", "author":"united kingdom", "score":5.874222}] }, {code} mimimum of query terms matched within the same field (i.e. all query terms required must be found in one of the fields) “PER FIELD” {code:java} sow = false mm=2 qf = author subjects_as_same_term q = united kingdom defType = edismax "parsedquery_toString": "+(((author:united author:kingdom)~2) | (((subjects_as_same_term:uk subjects_as_same_term:"united kingdom" subjects_as_same_term:england subjects_as_same_term:london subjects_as_same_term:british subjects_as_same_term:britain))~1))" {code} This (author:united author:kingdom)~2 means we need both the clauses to match to have a good candidate, in disjunction with (subjects_as_same_term:uk subjects_as_same_term:”united kingdom” subjects_as_same_term:england subjects_as_same_term:london subjects_as_same_term:british subjects_as_same_term:britain))~1 that means we need at least one clause to match (because synonyms expanded the two original terms into a single one) {code:java} "response":{"numFound":1,"start":0,"maxScore":5.874222,"numFoundExact":true,"docs":[ { "id":"77777", "author":"united kingdom", "score":5.874222}] } {code} was (Author: alessandro.benedetti): Hi David, first of all thanks for taking your time to think about this, it is much appreciated. In regards to: {quote}sow=false implies the minimum should match is "per field"{quote} I was thinking the same you think (i.e. sow to not affect mm, and mm to always be "per document"). Then I spent some time investigating to write a dedicated advanced blog (coming out in the next few days) and I verified that currently in 8.8.2 it's not the case. Now, I don't know if it's on purpose or not, but if you have multi-field search, with different analysis per field, this is what you get (I post here a piece of the upcoming blog): When the query parsed moves from being term centric(sow=true) to field centric(sow=false and different text analysis), mm means two different things: mimimum of query terms matched, independently in which field (PER DOCUMENT) {code:java} sow = true mm=2 qf = author subjects_as_same_term q = united kingdom defType = edismax "parsedquery_toString": "+(((author:united | subjects_as_same_term:united) (author:kingdom | subjects_as_same_term:kingdom))~2)" {code} {code:java} "response":{"numFound":2,"start":0,"maxScore":7.757958,"numFoundExact":true,"docs":[ { "id":"888888", "author":"united", "subjects":["kingdom"], "score":7.757958}, { "id":"77777", "author":"united kingdom", "score":5.874222}] }, {code} mimimum of query terms matched within the same field (i.e. all query terms required must be found in one of the fields) “PER FIELD” {code:java} sow = false mm=2 qf = author subjects_as_same_term q = united kingdom defType = edismax "parsedquery_toString": "+(((author:united author:kingdom)~2) | (((subjects_as_same_term:uk subjects_as_same_term:"united kingdom" subjects_as_same_term:england subjects_as_same_term:london subjects_as_same_term:british subjects_as_same_term:britain))~1))" {code} This (author:united author:kingdom)~2 means we need both the clauses to match to have a good candidate, in disjunction with (subjects_as_same_term:uk subjects_as_same_term:”united kingdom” subjects_as_same_term:england subjects_as_same_term:london subjects_as_same_term:british subjects_as_same_term:britain))~1 that means we need at least one clause to match (because synonyms expanded the two original terms into a single one) {code:java} "response":{"numFound":1,"start":0,"maxScore":5.874222,"numFoundExact":true,"docs":[ { "id":"77777", "author":"united kingdom", "score":5.874222}] } {code} > eDismax sow=false doesn't work with string field types > ------------------------------------------------------ > > Key: SOLR-15407 > URL: https://issues.apache.org/jira/browse/SOLR-15407 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers > Affects Versions: 8.8.2 > Reporter: Alessandro Benedetti > Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > Currently, the sow=false should not tokenize the input user query text and > delegate to each field for query time text analysis. > But what happens if one of the queries involved is not analyzed? > For example, because it is a string field type? > Terms are split and the query generated is broken: > {code:java} > assertU(adoc("id", "75", "trait_ss", "multi term")); > public void testSplitOnWhitespace_stringField_shouldBuildSingleClause() > throws Exception > { > assertJQ(req("qf", "trait_ss", "defType", "edismax", "q", "multi > term", "sow", "false"), > "/response/numFound==1", "/response/docs/[0]/id=='75'"); > String parsedquery; > parsedquery = getParsedQuery( > req("qf", "trait_ss", "q", "multi term", "defType", "edismax", > "sow", "false", "debugQuery", "true")); > assertThat(parsedquery, anyOf(containsString("((trait_ss:multi > term))"))); > } > {code} > This test would be currently broken. > The current parsed query is wrongly: > (trait_ss:multi trait_ss:term) -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org