[
https://issues.apache.org/jira/browse/CALCITE-7213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18023866#comment-18023866
]
Julian Hyde commented on CALCITE-7213:
--------------------------------------
Should we even try to fight this battle?
If we pick a value of N for the supported OR-depth, someone in QA will test
Calcite with a tree of depth N+1 and duly report that Calcite is broken for N+1.
Queries are trees, and we use many other recursive algorithms. Converting them
all to iterative traversal is a lot of work, and will inevitably introduce bugs.
Before we jump into this, I would like to see the problem quantified (what are
the operators, what are the current limits, which pieces of code in the
validator and elsewhere would need to be changed) and I would like to see a
commitment from a company to resource it to completion.
> StackOverflowError in SqlValidatorImpl with long OR/AND predicate chains
> ------------------------------------------------------------------------
>
> Key: CALCITE-7213
> URL: https://issues.apache.org/jira/browse/CALCITE-7213
> Project: Calcite
> Issue Type: Bug
> Reporter: Viggo Chen
> Priority: Major
>
> h1. Problem Description:
> Apache Calcite's SQL validation stage is vulnerable to
> `java.lang.StackOverflowError` when processing queries that contain a large
> number of chained logical operators (`AND`/`OR`). This occurs because the
> parser generates a deeply nested, unbalanced Abstract Syntax Tree (AST) for
> these predicates. Subsequent recursive traversal during the validation phase
> exhausts the JVM's call stack, causing the process to fail.
> This issue prevents Calcite from handling complex, often machine-generated,
> SQL queries, which are prevalent in modern data systems.
> Steps to Reproduce:
> The issue can be reliably reproduced with a test case that generates a long
> chain of `OR` conditions. The following Java code creates a query with 5000
> `OR` predicates, which is sufficient to trigger the stack overflow during the
> validation step.
> {code:java}
> // Test case to generate the problematic SQL
> final String whereClause = java.util.stream.IntStream.range(0, 5000)
> .mapToObj(i -> "sal = " + i)
> .collect(java.util.stream.Collectors.joining(" OR "));
> // This will throw StackOverflowError during the validation phase
> sql("SELECT * FROM emp WHERE " + whereClause).ok(); {code}
> Generated SQL:
> {code:java}
> SELECT * FROM emp WHERE sal = 0 OR sal = 1 OR sal = 2 OR ... OR sal = 4999
> {code}
> h1. Stack Trace Analysis:
> A typical stack trace points to a deep recursion within the
> `SqlValidatorImpl` component, demonstrating that the failure happens during
> the validation and semantic analysis of the expression tree.
> {code:java}
> java.lang.StackOverflowError
> at org.apache.calcite.sql.SqlBasicCall.getKind(SqlBasicCall.java:83)
> at
> org.apache.calcite.sql.validate.SqlValidatorImpl.performUnconditionalRewrites(SqlValidatorImpl.java:1458)
> at
> org.apache.calcite.sql.validate.SqlValidatorImpl.performUnconditionalRewrites(SqlValidatorImpl.java:1473)
> at
> org.apache.calcite.sql.validate.SqlValidatorImpl.performUnconditionalRewrites(SqlValidatorImpl.java:1473)
> // ... repeats thousands of times {code}
> h1. Impact on System Stability:
> While a query with thousands of chained predicates may be considered
> suboptimal, the framework's response should be a graceful error (e.g., a
> validation exception), not a catastrophic failure. A
> `java.lang.StackOverflowError` is a critical, non-recoverable error that can
> terminate the executing thread and potentially bring down the entire service.
> This transforms a query-level issue into a severe system stability and
> reliability problem.
>
> h1. Suggested Solutions:
> I propose a two-part approach: an immediate defensive measure to improve
> stability, and long-term solutions to fundamentally fix the issue.
> h2. Part 1: Immediate Defensive Measure (Mitigation)
> Introduce a configurable limit on the maximum depth of the AST. This acts as
> a "safety fuse" to prevent stack overflows.
> - Mechanism: After parsing, or at the very beginning of the validation phase,
> perform a quick, non-recursive depth check of the `SqlNode` tree.
> - Configuration: This depth limit should be configurable (e.g., via
> `SqlParser.Config` or `SqlValidator.Config`). The default value should be set
> to a safe but reasonable number (e.g., 1000).
> - Action: If the AST depth exceeds the configured threshold, Calcite should
> immediately throw a standard `SqlParseException` or `SqlValidatorException`
> with a clear error message, such as "Query exceeds maximum expression depth
> limit of [N]".
> This provides users with an immediate way to protect their systems from
> problematic queries without waiting for a full architectural fix.
> h2. Part 2: Fundamental Fixes (Long-Term Solutions)
> To properly support these complex queries, the underlying cause of the deep
> recursion must be addressed.
> 1. Adopt Iterative Traversal (Recommended): The most robust solution is to
> refactor the recursive algorithms within `SqlValidatorImpl` and other core
> components to use an explicit, heap-allocated stack (e.g.,
> `java.util.Deque`). This converts the recursive traversal into an iterative
> one, which is not constrained by the call stack depth and can handle
> expression trees of any complexity. This holistically solves this class of
> problems.
> 2. Balanced Tree Construction: An alternative is to modify the parser's
> behavior. Instead of generating a left-deep tree for chained operators like
> `a OP b OP c OP d`, it could produce a balanced tree: `OP(OP(a, b), OP(c,
> d))`. A balanced tree's depth grows logarithmically (`log2(N)`) with the
> number of operators, which would naturally prevent stack overflows for even a
> very large number of predicates. This could be a configurable parser feature.
> By implementing the defensive measure from Part 1, we can provide immediate
> stability. Following up with one of the fundamental fixes from Part 2 will
> ensure Calcite's long-term resilience and scalability.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)