Hi Solr community,

We are currently trying to upgrade Solr from v8 to 9 and we have stumbled
upon an issue - the queries that we are using for search are resulting in
much more clauses being generated than before, hitting the
maxBooleanClauses limit for some simple queries (even if we increase the
limit). I'll try to describe our issue as concise as possible:
When we search for a phrase like "SAP S/4HANA" and use synonym expansion,
in the parsedQuery in Solr 8, we can see this:

parsedquery_toString":"+(spanNear([spanOr([body:sap-anwend,
body:sap-anwendungsbereich, body:sap-anwendungsbereich,
body:sap-bereich, body:sap-erfahr, body:sap-expertis ...

etc, whereas  the same search in Solr9 yields a different parsedQuery:

"parsedquery_toString":"+((body:\"sap-anwend sap business suit 4
hana\" body:\"sap-anwend sap business suit 4 sap hana\"
body:\"sap-anwend sap business suit for hana\" ...

which when we analyzed, we noticed that it created a combination for
all synonyms of term SAP and all synonyms for S/4HANA. Since only the
term SAP alone has about 300 synonyms in our synonyms.txt, combined
with synonyms for S/4HANA, the number of clauses got up to over 2000.
If there are more terms and fields that we search for, this easily
explodes into a giant parsedQuery and we get the maxBooleanClauses
error.

Looking at the documentation and code, we could not figure out why
there is a difference in Solr 9, what was exactly changed in the
implementation, and what happened to the spanNear and spanOr. The
queries that we are using in Solr8 were not having performance issues
so far.

What are we missing? Is there a way to avoid creating combinations of
synonyms when searching for phrases? It seems to not be happening when
doing a regular search for both terms SAP S/4HANA, without quotes.

One thing that we probably should do is minimize the number of
synonyms in our file, or give up on searching for multiword phrases.

I hope there is someone that can enlighten us in this matter :)

Thank you!




Herzliche Grüße / Best regards

*Ivana Pranjic*
Software Developer

*GULP Information Services GmbH*


Telefon: +49 89 500316717

E-Mail: ivana.pran...@gulp.de


*GULP - experts united*
www.gulp.de - a Randstad company

GULP Information Services GmbH
Sitz: München, Amtsgericht München HRB 207 941
Geschäftsführer: Michel Verdoold (Vors.), Arie Blom

[image: Trustpilot Human score]
<https://de.trustpilot.com/review/www.gulp.de?utm_medium=Trustbox&utm_source=EmailSignature4>
   [image: Trustpilot Stars]
<https://de.trustpilot.com/review/www.gulp.de?utm_medium=Trustbox&utm_source=EmailSignature4>
   [image: Trustpilot Logo]
<https://de.trustpilot.com/review/www.gulp.de?utm_medium=Trustbox&utm_source=EmailSignature4>

<https://www.facebook.com/GULP.Jobs> <https://twitter.com/gulp_news>
<https://www.xing.com/pages/gulp>
<https://www.linkedin.com/company/gulp-experts-united>
<https://www.instagram.com/gulp_karriere>

Reply via email to