Hello Ivana. I think the change caused this is [LUCENE-9207] Don't build SpanQuery in QueryBuilder - ASF JIRA (apache.org) <https://issues.apache.org/jira/browse/LUCENE-9207> Also, please check the last comments in Don't build SpanQuery in QueryBuilder [LUCENE-9207] · Issue #10247 · apache/lucene · GitHub <https://github.com/apache/lucene/issues/10247> where I attempted to discuss a way to reproduce old buggish nested span in fancy new intervals queries. So, far it's stuck, I don't know for what reason.
On Tue, Mar 21, 2023 at 1:59 PM Ivana Pranjic <ivana.pran...@gulp.de.invalid> wrote: > Hi Solr community, > > We are currently trying to upgrade Solr from v8 to 9 and we have stumbled > upon an issue - the queries that we are using for search are resulting in > much more clauses being generated than before, hitting the > maxBooleanClauses limit for some simple queries (even if we increase the > limit). I'll try to describe our issue as concise as possible: > When we search for a phrase like "SAP S/4HANA" and use synonym expansion, > in the parsedQuery in Solr 8, we can see this: > > parsedquery_toString":"+(spanNear([spanOr([body:sap-anwend, > body:sap-anwendungsbereich, body:sap-anwendungsbereich, > body:sap-bereich, body:sap-erfahr, body:sap-expertis ... > > etc, whereas the same search in Solr9 yields a different parsedQuery: > > "parsedquery_toString":"+((body:\"sap-anwend sap business suit 4 > hana\" body:\"sap-anwend sap business suit 4 sap hana\" > body:\"sap-anwend sap business suit for hana\" ... > > which when we analyzed, we noticed that it created a combination for > all synonyms of term SAP and all synonyms for S/4HANA. Since only the > term SAP alone has about 300 synonyms in our synonyms.txt, combined > with synonyms for S/4HANA, the number of clauses got up to over 2000. > If there are more terms and fields that we search for, this easily > explodes into a giant parsedQuery and we get the maxBooleanClauses > error. > > Looking at the documentation and code, we could not figure out why > there is a difference in Solr 9, what was exactly changed in the > implementation, and what happened to the spanNear and spanOr. The > queries that we are using in Solr8 were not having performance issues > so far. > > What are we missing? Is there a way to avoid creating combinations of > synonyms when searching for phrases? It seems to not be happening when > doing a regular search for both terms SAP S/4HANA, without quotes. > > One thing that we probably should do is minimize the number of > synonyms in our file, or give up on searching for multiword phrases. > > I hope there is someone that can enlighten us in this matter :) > > Thank you! > > > > > Herzliche Grüße / Best regards > > *Ivana Pranjic* > Software Developer > > *GULP Information Services GmbH* > > > Telefon: +49 89 500316717 > > E-Mail: ivana.pran...@gulp.de > > > *GULP - experts united* > www.gulp.de - a Randstad company > > GULP Information Services GmbH > Sitz: München, Amtsgericht München HRB 207 941 > Geschäftsführer: Michel Verdoold (Vors.), Arie Blom > > [image: Trustpilot Human score] > < > https://de.trustpilot.com/review/www.gulp.de?utm_medium=Trustbox&utm_source=EmailSignature4 > > > [image: Trustpilot Stars] > < > https://de.trustpilot.com/review/www.gulp.de?utm_medium=Trustbox&utm_source=EmailSignature4 > > > [image: Trustpilot Logo] > < > https://de.trustpilot.com/review/www.gulp.de?utm_medium=Trustbox&utm_source=EmailSignature4 > > > > <https://www.facebook.com/GULP.Jobs> <https://twitter.com/gulp_news> > <https://www.xing.com/pages/gulp> > <https://www.linkedin.com/company/gulp-experts-united> > <https://www.instagram.com/gulp_karriere> > -- Sincerely yours Mikhail Khludnev https://t.me/MUST_SEARCH A caveat: Cyrillic!