[ 
https://issues.apache.org/jira/browse/SOLR-12243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16836249#comment-16836249
 ] 

Fredrik Rodland edited comment on SOLR-12243 at 5/9/19 9:56 AM:
----------------------------------------------------------------

I am aware that this issue is closed, but nonetheless:

I think this actually broke something regarding expansion of synonyms for large 
queries (possibly large OR-queries).

Having \{code}pf\{code} enabled on fields with a substansial amount of synonym 
resulted in the pf-portion of the query growing "exponentially" and resulted in 
one single query taking down an entire solr-server.

By adjusting the number of OR-queries we were able to increase the memory 
required for running the query.

example (id has synonyms enabled, companyname has not):
{code:java}
q= ( samfunnsviter (klima OR miljø) ) NOT ( psykolog%20 OR rus OR ortopedi OR 
odontologi )&debugQuery=true&pf=companyname\
{code}
results in pf-part of edismax-query

{code}(+DisjunctionMaxQuery((companyname:\"? samfunnsviter klima miljø ? ? 
psykolog rus ortopedi odontologi\"~5)~0.01))\{code}

 
{code:java}
q= ( samfunnsviter (klima OR miljø) ) NOT ( psykolog%20 OR rus OR ortopedi OR 
odontologi )&debugQuery=true&pf=id companyname\
{code}
results in pf-part of edismax-query

{code}(+DisjunctionMaxQuery(((id:\"samfunnsviter klima miljø psykolog rus 
ortopedi odontologi\"~5 id:\"samfunnsviter klima miljø psykologspesialist rus 
ortopedi odontologi\"~5 id:\"samfunnsvitar klima miljø psykolog rus ortopedi 
odontologi\"~5 id:\"samfunnsvitar klima miljø psykologspesialist rus ortopedi 
odontologi\"~5 id:\"social scientist klima miljø psykolog rus ortopedi 
odontologi\"~5 id:\"social scientist klima miljø psykologspesialist rus 
ortopedi odontologi\"~5 id:\"statsviter klima miljø psykolog rus ortopedi 
odontologi\"~5 id:\"statsviter klima miljø psykologspesialist rus ortopedi 
odontologi\"~5 id:\"samfunnsøkonom klima miljø psykolog rus ortopedi 
odontologi\"~5 id:\"samfunnsøkonom klima miljø psykologspesialist rus ortopedi 
odontologi\"~5) | companyname:\"? samfunnsviter klima miljø ? ? psykolog rus 
ortopedi odontologi\"~5)~0.01))\{code}

 

 increasing the number of OR-terms or synonyms results in the id-part of the 
query growing "exponentially"


was (Author: fmr):
I am aware that this issue is closed, but nonetheless:

I think this actually broke something regarding expansion of synonyms for large 
queries (possibly large OR-queries).

Having \{code}pf\{code} enabled on fields with a substansial amount of synonym 
resulted in the pf-portion of the query growing "exponentially" and resulted in 
one single query taking down an entire solr-server.

By adjusting the number of OR-queries we were able to increase the memory 
required for running the query.

example (id has synonyms enabled, companyname has not):

{code}q= ( samfunnsviter (klima OR miljø) ) NOT ( psykolog%20 OR rus OR 
ortopedi OR odontologi )&debugQuery=true&pf=companyname\{code}

results in pf-part of edismax-query

{code}(+DisjunctionMaxQuery((companyname:\"? samfunnsviter klima miljø ? ? 
psykolog rus ortopedi odontologi\"~5)~0.01))\{code}

 

{code}q= ( samfunnsviter (klima OR miljø) ) NOT ( psykolog%20 OR rus OR 
ortopedi OR odontologi )&debugQuery=true&pf=id companyname\{code}

results in pf-part of edismax-query

{code}(+DisjunctionMaxQuery(((id:\"samfunnsviter klima miljø psykolog rus 
ortopedi odontologi\"~5 id:\"samfunnsviter klima miljø psykologspesialist rus 
ortopedi odontologi\"~5 id:\"samfunnsvitar klima miljø psykolog rus ortopedi 
odontologi\"~5 id:\"samfunnsvitar klima miljø psykologspesialist rus ortopedi 
odontologi\"~5 id:\"social scientist klima miljø psykolog rus ortopedi 
odontologi\"~5 id:\"social scientist klima miljø psykologspesialist rus 
ortopedi odontologi\"~5 id:\"statsviter klima miljø psykolog rus ortopedi 
odontologi\"~5 id:\"statsviter klima miljø psykologspesialist rus ortopedi 
odontologi\"~5 id:\"samfunnsøkonom klima miljø psykolog rus ortopedi 
odontologi\"~5 id:\"samfunnsøkonom klima miljø psykologspesialist rus ortopedi 
odontologi\"~5) | companyname:\"? samfunnsviter klima miljø ? ? psykolog rus 
ortopedi odontologi\"~5)~0.01))\{code}

 

 increasing the number of OR-terms or synonyms results in the id-part of the 
query growing "exponentially"

> Edismax missing phrase queries when phrases contain multiterm synonyms
> ----------------------------------------------------------------------
>
>                 Key: SOLR-12243
>                 URL: https://issues.apache.org/jira/browse/SOLR-12243
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: query parsers
>    Affects Versions: 7.1
>         Environment: RHEL, MacOS X
> Do not believe this is environment-specific.
>            Reporter: Elizabeth Haubert
>            Assignee: Steve Rowe
>            Priority: Major
>             Fix For: 7.6, 8.0
>
>         Attachments: SOLR-12243.patch, SOLR-12243.patch, SOLR-12243.patch, 
> SOLR-12243.patch, SOLR-12243.patch, SOLR-12243.patch, SOLR-12243.patch, 
> multiword-synonyms.txt, schema.xml, solrconfig.xml
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> synonyms.txt:
> {code}
> allergic, hypersensitive
> aspirin, acetylsalicylic acid
> dog, canine, canis familiris, k 9
> rat, rattus
> {code}
> request handler:
> {code:xml}
> <requestHandler name="/test_qparse_error" class="solr.SearchHandler">
>  <lst name="defaults">
> <!-- Query settings -->
>  <str name="defType">edismax</str>
>  <str name="tie"> 0.4</str>
>  <str name="qf">title^100</str>
>  <str name="pf">title~20^5000</str>
>  <str name="pf2">title~11</str>
>  <str name="pf3">title~22^1000</str>
>  <str name="df">text</str>
>  <!-- mm If two or fewer clauses exist, they all must match. 
>  If three to five clauses exist, one can be missing. If six to eight clauses 
> exist, all but three must match. 
>  If more than nine clauses exist, only require 30% to match.-->
>  <str name="mm">3&lt;-1 6&lt;-3 9&lt;30%</str>
>  <str name="q.alt">*:*</str>
>  <str name="rows">25</str>
> </lst>
> </requestHandler>
> {code}
> Phrase queries (pf, pf2, pf3) containing "dog" or "aspirin"  against the 
> above list will not be generated.
> "allergic reaction dog" will generate pf2: "allergic reaction", but not 
> pf:"allergic reaction dog", pf2: "reaction dog", or pf3: "allergic reaction 
> dog"
> "aspirin dose in rats" will generate pf3: "dose ? rats" but not pf2: "aspirin 
> dose" or pf3:"aspirin dose ?"
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to