[jira] [Updated] (SOLR-15449) edimax sow causes issues with minimum should match in case of multi field with different analysis

Alessandro Benedetti (Jira) Wed, 02 Jun 2021 03:30:04 -0700


     [ 
https://issues.apache.org/jira/browse/SOLR-15449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Alessandro Benedetti updated SOLR-15449:
----------------------------------------
    Description: 
h1. Intro
in multi-field search with different analysis per field

sow=true implies the minimum should match is "per document"
i.e a document to be a match must contain all the mm query terms anywhere at 
least once

sow=false implies the minimum should match is "per field"
 i.e a document to be a match must contain all the mm query terms in a single 
field at least once

When the query parsed moves from being term centric(sow=true) to field 
centric(sow=false and different text analysis), mm means two different things:


{code:java}
sow = true
mm=2
qf = author subjects_as_same_term
q = united kingdom
defType = edismax
"parsedquery_toString":
"+(((author:united | subjects_as_same_term:united) (author:kingdom | 
subjects_as_same_term:kingdom))~2)"
{code}
{code:java}
"response":{"numFound":2,"start":0,"maxScore":7.757958,"numFoundExact":true,"docs":[
      {
        "id":"888888",
        "author":"united",
        "subjects":["kingdom"],
        "score":7.757958},
      {
        "id":"77777",
        "author":"united kingdom",
        "score":5.874222}]
  },
{code}

mimimum of query terms matched within the same field (i.e. all query terms 
required must be found in one of the fields)
“PER FIELD”


{code:java}
sow = false
mm=2
qf = author subjects_as_same_term
q = united kingdom
defType = edismax
"parsedquery_toString":
"+(((author:united author:kingdom)~2) | 
(((subjects_as_same_term:uk subjects_as_same_term:"united kingdom" 
subjects_as_same_term:england subjects_as_same_term:london 
subjects_as_same_term:british subjects_as_same_term:britain))~1))"
{code}

This (author:united author:kingdom)~2 means we need both the clauses to match 
to have a good candidate, in disjunction with
(subjects_as_same_term:uk subjects_as_same_term:”united kingdom” 
subjects_as_same_term:england subjects_as_same_term:london 
subjects_as_same_term:british subjects_as_same_term:britain))~1 that means we 
need at least one clause to match (because synonyms expanded the two original 
terms into a single one)


{code:java}
"response":{"numFound":1,"start":0,"maxScore":5.874222,"numFoundExact":true,"docs":[
      {
        "id":"77777",
        "author":"united kingdom",
        "score":5.874222}]
  }
{code}

h1. Problem
When a field text analysis is incompatible with the query text, mm is not fully 
respected:

{code:java}
sow = false
mm=100%
qf = text numeric_i
q = terminator 100
defType = edismax
"parsedquery_toString":
"+(((text:terminator text:100)~2) | 
(numeric_i:100)~1))"
{code}

A document just containing '100' in the field numeric_i is returned as a good 
search result but it actually doesn't respect the mm=100%

  was:
in multi-field search with different analysis per field

sow=true implies the minimum should match is "per document"
i.e a document to be a match must contain all the mm query terms anywhere at 
least once

sow=false implies the minimum should match is "per field"
 i.e a document to be a match must contain all the mm query terms in a single 
field at least once

When the query parsed moves from being term centric(sow=true) to field 
centric(sow=false and different text analysis), mm means two different things:


{code:java}
sow = true
mm=2
qf = author subjects_as_same_term
q = united kingdom
defType = edismax
"parsedquery_toString":
"+(((author:united | subjects_as_same_term:united) (author:kingdom | 
subjects_as_same_term:kingdom))~2)"
{code}
{code:java}
"response":{"numFound":2,"start":0,"maxScore":7.757958,"numFoundExact":true,"docs":[
      {
        "id":"888888",
        "author":"united",
        "subjects":["kingdom"],
        "score":7.757958},
      {
        "id":"77777",
        "author":"united kingdom",
        "score":5.874222}]
  },
{code}

mimimum of query terms matched within the same field (i.e. all query terms 
required must be found in one of the fields)
“PER FIELD”


{code:java}
sow = false
mm=2
qf = author subjects_as_same_term
q = united kingdom
defType = edismax
"parsedquery_toString":
"+(((author:united author:kingdom)~2) | 
(((subjects_as_same_term:uk subjects_as_same_term:"united kingdom" 
subjects_as_same_term:england subjects_as_same_term:london 
subjects_as_same_term:british subjects_as_same_term:britain))~1))"
{code}

This (author:united author:kingdom)~2 means we need both the clauses to match 
to have a good candidate, in disjunction with
(subjects_as_same_term:uk subjects_as_same_term:”united kingdom” 
subjects_as_same_term:england subjects_as_same_term:london 
subjects_as_same_term:british subjects_as_same_term:britain))~1 that means we 
need at least one clause to match (because synonyms expanded the two original 
terms into a single one)


{code:java}
"response":{"numFound":1,"start":0,"maxScore":5.874222,"numFoundExact":true,"docs":[
      {
        "id":"77777",
        "author":"united kingdom",
        "score":5.874222}]
  }
{code}



> edimax sow causes issues with minimum should match in case of multi field 
> with different analysis
> -------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-15449
>                 URL: https://issues.apache.org/jira/browse/SOLR-15449
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>    Affects Versions: 8.8.2
>            Reporter: Alessandro Benedetti
>            Priority: Major
>
> h1. Intro
> in multi-field search with different analysis per field
> sow=true implies the minimum should match is "per document"
> i.e a document to be a match must contain all the mm query terms anywhere at 
> least once
> sow=false implies the minimum should match is "per field"
>  i.e a document to be a match must contain all the mm query terms in a single 
> field at least once
> When the query parsed moves from being term centric(sow=true) to field 
> centric(sow=false and different text analysis), mm means two different things:
> {code:java}
> sow = true
> mm=2
> qf = author subjects_as_same_term
> q = united kingdom
> defType = edismax
> "parsedquery_toString":
> "+(((author:united | subjects_as_same_term:united) (author:kingdom | 
> subjects_as_same_term:kingdom))~2)"
> {code}
> {code:java}
> "response":{"numFound":2,"start":0,"maxScore":7.757958,"numFoundExact":true,"docs":[
>       {
>         "id":"888888",
>         "author":"united",
>         "subjects":["kingdom"],
>         "score":7.757958},
>       {
>         "id":"77777",
>         "author":"united kingdom",
>         "score":5.874222}]
>   },
> {code}
> mimimum of query terms matched within the same field (i.e. all query terms 
> required must be found in one of the fields)
> “PER FIELD”
> {code:java}
> sow = false
> mm=2
> qf = author subjects_as_same_term
> q = united kingdom
> defType = edismax
> "parsedquery_toString":
> "+(((author:united author:kingdom)~2) | 
> (((subjects_as_same_term:uk subjects_as_same_term:"united kingdom" 
> subjects_as_same_term:england subjects_as_same_term:london 
> subjects_as_same_term:british subjects_as_same_term:britain))~1))"
> {code}
> This (author:united author:kingdom)~2 means we need both the clauses to match 
> to have a good candidate, in disjunction with
> (subjects_as_same_term:uk subjects_as_same_term:”united kingdom” 
> subjects_as_same_term:england subjects_as_same_term:london 
> subjects_as_same_term:british subjects_as_same_term:britain))~1 that means we 
> need at least one clause to match (because synonyms expanded the two original 
> terms into a single one)
> {code:java}
> "response":{"numFound":1,"start":0,"maxScore":5.874222,"numFoundExact":true,"docs":[
>       {
>         "id":"77777",
>         "author":"united kingdom",
>         "score":5.874222}]
>   }
> {code}
> h1. Problem
> When a field text analysis is incompatible with the query text, mm is not 
> fully respected:
> {code:java}
> sow = false
> mm=100%
> qf = text numeric_i
> q = terminator 100
> defType = edismax
> "parsedquery_toString":
> "+(((text:terminator text:100)~2) | 
> (numeric_i:100)~1))"
> {code}
> A document just containing '100' in the field numeric_i is returned as a good 
> search result but it actually doesn't respect the mm=100%



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Updated] (SOLR-15449) edimax sow causes issues with minimum should match in case of multi field with different analysis

Reply via email to