Hi Hoss,

This is a really helpful explanation!
Even though I already shifted to the usage of the {!terms} query for such
large boolean clause queries, it feels a lot better to know how and why
things behave differently compared to the 8x solr version.

Thanks!
Michael

On Tue, Dec 6, 2022 at 7:32 PM Chris Hostetter <hossman_luc...@fucit.org>
wrote:

>
> : I'm happy to provide some details as I still do not really understand the
> : difference to the situation before.
>
> The main difference is coming from the changes introduced in LUCENE-8811
> (Lucene 9.0) which sought to ensure that the "global" maxClauseCount would
> be honored no matter what kind of nested structure the query might
> involve.
>
> You're situation is an interesting case that i had never considered, more
> detais below...
>
> : * I upgraded from 8.11.1 to 9.1. I observed the behavior for a completely
> : rebuild index (solr version 9.1 / lucene version 9.3)
>
> thank you for clarifing.  This confirms that changes introduced
> in LUCENE-8811 (and related solr issues) are relavant to the change in
> behavior you are seeing (if you had said you upgraded from Solr 9 we'd be
> having a different conversation)
>
> : * maxBooleanClauses is only configured in solrconfig.xml (1024) but not
> in
> : solr.xml.
>
> FYI: If you don't configure in solr.xml, then the (Lucene) default
> IndexSearcher.getMaxClauseCount() is left as is (and that is also 1024)
>
> : * Sorry for the confusion about the field definition. As you already
> : assumed correctly: 'categoryId' is also a 'p_long_dv'
>
> Meaning that it has both points nad docvalues configured, which it turns
> out is significant to why it behaves differently from a string field.
>
>
> : * Stacktrace for String field ("id"). For better readability I replaced
> the
> : original query by "1 2 ... 1025":
>
> Snipping down to the key lines of code from the root cause...
>
> : Caused by: org.apache.lucene.search.IndexSearcher$TooManyClauses:
> : maxClauseCount is set to 1024
> :         at
> : org.apache.lucene.search.BooleanQuery$Builder.add(BooleanQuery.java:116)
> :         at
> : org.apache.lucene.search.BooleanQuery$Builder.add(BooleanQuery.java:130)
> :         at
> :
> org.apache.solr.parser.SolrQueryParserBase.rawToNormal(SolrQueryParserBase.java:1065)
>
> ...so in this case, as the query parser is building up a boolean query (of
> many strings), it is hitting the limit because the (top level) boolean
> query is being asked to add one more item then
> IndexSearcher.getMaxClauseCount() == 1024
>
>
> : * Stacktrace for Point field ("categoryId") with 1 2 ... 513:
>
> Again, snipping down to just the key lines of code.  (Note also the
> difference in the exception message: "too many nested clauses") ..
>
> : org.apache.lucene.search.IndexSearcher$TooManyNestedClauses: Query
> contains
> : too many nested clauses; maxClauseCount is set to 1024
> :         at
> :
> org.apache.lucene.search.IndexSearcher$3.visitLeaf(IndexSearcher.java:801)
> :         at
> :
> org.apache.lucene.document.SortedNumericDocValuesRangeQuery.visit(SortedNumericDocValuesRangeQuery.java:73)
> :         at
> :
> org.apache.lucene.search.IndexOrDocValuesQuery.visit(IndexOrDocValuesQuery.java:121)
> :         at
> : org.apache.lucene.search.BooleanQuery.visit(BooleanQuery.java:575)
> :         at
> : org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:769)
>
> ...here the exception is happening during the actual search -- meaning the
> query parser had no problem building up the BooleanQuery of 512 clauses
>
> But what matters is that each of those 512 clauses is no longer a simple
> exact term query (or a simple exact point query, or a simple exact
> docvalue query) ... because this fieldType is configured to support both
> points and docvalues, those 512 clauses are IndexOrDocValuesQuery queries
> -- which each contain 2 sub-clauses
>
> (the purpose of this class is to provide teh most efficient impl based on
> where/how this clause is used, which can depend on term stats, other
> clauses in the parent query, etc...)
>
> So to sumarize:
>
> 1) the reason you're seeing this behavior in 9x but didnt' in 8x is
> because 9x added more checks of the safety valve
>
> 2) the reason you're seeing the 1024 limit hit for some (but not all)
> fields, even with with less then 1024 "original user query clauses" is
> because for some (but not all) field types, 1 original query clause can
> become N internal clauses.
>
>
> -Hoss
> http://www.lucidworks.com/
>

Reply via email to