[ 
https://issues.apache.org/jira/browse/CALCITE-7202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18023831#comment-18023831
 ] 

Ruben Q L commented on CALCITE-7202:
------------------------------------

IIRC Calcite converts IN into Subquery if the list of values is beyond a 
certain threshold (by default 20); so theoretically a possible workaround could 
be to overwrite this parameter to NOT convert into a subquery and keep the OR 
expression in all cases. This could be achieved by creating the 
SqlToRelConverter using as config parameter 
{{SqlToRelConverter.CONFIG.withInSubQueryThreshold(Integer.MAX_VALUE)}}


> Memory Explosion in Large IN Clauses
> ------------------------------------
>
>                 Key: CALCITE-7202
>                 URL: https://issues.apache.org/jira/browse/CALCITE-7202
>             Project: Calcite
>          Issue Type: Bug
>    Affects Versions: 1.40.0
>            Reporter: Charles Givre
>            Priority: Major
>
> We have been trying to update Drill’s Calcite dependency from version 1.34 to 
> 1.40.  It has been quite a challenge, but I feel we are very close to a 
> working solution.  We’ve run into an issue which we can’t seem to solve and 
> would like to request some help from the Calcite community. 
> Here is a link to the draft PR:  [https://github.com/apache/drill/pull/3024]. 
>  This may not have all the latest attempts but should be fairly recent.  The 
> specific test which you can use to verify this here:
> https://github.com/apache/drill/blob/ee4c0236f8bc7d8f7c7b21f4e5c94939fe62e900/exec/java-exec/src/test/java/org/apache/drill/TestInList.java#L32-L42
> The gist of the issue is that when Drill is processing queries with large IN 
> clauses (60+ items), we observe exponential memory growth leading to 
> OutOfMemoryError or test timeouts. This occurs specifically during the 
> SUBQUERY_REWRITE planning phase.
> Here is a sample of a failing query:
> SELECT employee_id FROM cp.`employee.json`
> WHERE employee_id IN (1, 2, 3, ..., 60)  -- 60 items
> Basically, we’re getting the following errors:
> * Memory explosion during rule matching/firing
> * OutOfMemoryError in SubQueryRemoveRule.rewriteIn()
> * Complete system hang requiring timeout termination
> To solve this, we tried a few different approaches including:
> * Rule-level fixes: Modified DrillSubQueryRemoveRule.matches() to detect and 
> skip large IN clauses
> * Apply-method handling: Added exception catching and fallback logic in 
> apply()
> * Threshold tuning: Tested various IN clause size limits (50, 100, 200 items)
> * Memory analysis: Confirmed the issue exists even with minimal rule 
> configurations
> We found that the Memory explosion occurs before apply(). The issue manifests 
> itself during rule matching/firing phase, not in rule execution.   The 
> exponential growth pattern appears to be related to variant creation or trait 
> propagation.  I will also add that this works fine with ~20 items but fails 
> consistently with 60+ items.  These queries worked with Calcite 1.34 but are 
> failing as part with the upgrade to Calcite 1.40.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to