[ 
https://issues.apache.org/jira/browse/CALCITE-6467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17930650#comment-17930650
 ] 

Gonzalo Ortiz edited comment on CALCITE-6467 at 2/26/25 11:10 AM:
------------------------------------------------------------------

After doing some tests, it looks to me that the 
[SqlToRelConverter.expand|https://github.com/apache/calcite/blob/638d3c433182840fc51b5ec8dd9077a50947269f/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java#L6423]
 property introduced in 1.39 is exactly what Pinot is looking for.

Edit: The feature was not introduced there, but the commit that fixes our 
problem is 
[https://github.com/apache/calcite/commit/84ea4be50b01cd9539cb555c298c21172089249b|https://github.com/apache/calcite/commit/84ea4be50b01cd9539cb555c298c21172089249b,]

I'll be taking a look to this version release and try to update Pinot code to 
this Caclcite version to use this new mode. I think we can close this issue now.

> I have in mind similar discussions around the type system, constant folding, 
> even simplifications, some people fork (and then regret later), some others 
> invest time and energy to improve things upstream for everyone to benefit 
> (including themselves).

Sure! I think forking Calcite or using our own relational algebra 
implementation would be a very bad idea. Instead, I would love to contribute 
the main repo with what we need. In fact I was trying to write a patch to 
support something similar to this expand mode when I found that.


was (Author: JIRAUSER301988):
After doing some tests, it looks to me that the 
[SqlToRelConverter.expand|https://github.com/apache/calcite/blob/638d3c433182840fc51b5ec8dd9077a50947269f/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java#L6423]
 property introduced in 1.39 is exactly what Pinot is looking for.

I'll be taking a look to this version release and try to update Pinot code to 
this Caclcite version to use this new mode. I think we can close this issue now.

> I have in mind similar discussions around the type system, constant folding, 
> even simplifications, some people fork (and then regret later), some others 
> invest time and energy to improve things upstream for everyone to benefit 
> (including themselves).

Sure! I think forking Calcite or using our own relational algebra 
implementation would be a very bad idea. Instead, I would love to contribute 
the main repo with what we need. In fact I was trying to write a patch to 
support something similar to this expand mode when I found that.

> Performance of RelMdUtil.checkInputForCollationAndLimit when using `where col 
> in (large literal set)`
> -----------------------------------------------------------------------------------------------------
>
>                 Key: CALCITE-6467
>                 URL: https://issues.apache.org/jira/browse/CALCITE-6467
>             Project: Calcite
>          Issue Type: Improvement
>    Affects Versions: 1.32.0, 1.33.0, 1.34.0, 1.35.0, 1.36.0, 1.37.0, 1.38.0
>            Reporter: Gonzalo Ortiz
>            Priority: Major
>         Attachments: MultistageEngineQuickStart_2024_07_12_111558.jfr, 
> image-2024-07-12-15-58-33-504.png
>
>
> Recently we have updated Pinot to use Calcite 1.37. Previously we were using 
> 1.31.
> After the upgrade, we have found issues when executing some queries that 
> include large IN clauses. Queries like:
> {code:java}
> explain plan for
> SELECT DestCityName
> FROM (
>          SELECT DestCityName
>          FROM airlineStats
>          WHERE DestCityName IN (
>                         'a1', 'a2', 'a3', ... 'a300'
>              )
>          GROUP BY DestCityName
>      ) as a
> {code}
> After some debug, we have found that the issue is in one of our custom rules (
> PinotSortExchangeCopyRule) when we call 
> `RelMdUtil.checkInputForCollationAndLimit` 
> ([link|https://github.com/apache/pinot/blob/dacc6d06907c44e83721454f1090e5f00c824f15/pinot-query-planner/src/main/java/org/apache/pinot/calcite/rel/rules/PinotSortExchangeCopyRule.java#L64]).
>  
> Comparing different Pinot versions, I've found out that the change that is 
> causing problems in our scenario is #CALCITE-5036 (see
>  
> [https://github.com/apache/calcite/pull/2743)|https://github.com/apache/calcite/pull/2743/files).].
>  Specifically, it looks like the problem appears when `RelMdPredicates` tries 
> to simplify the expression, which at the time is basically an 
> `OR(DestCityName = a1, DestCityName = a2, ...)`. 
> `RexSimplify.simplifyOrTerms` negates previous terms in order to apply 
> possible optimizations, but in cases like this where we have hundreds of 
> literals, that is very expensive. Going more in detail, it looks like most of 
> the time is invested in creating new ranges:
> !image-2024-07-12-15-58-33-504.png!
>  
> You may find more insights in the attached JFR file
> [^MultistageEngineQuickStart_2024_07_12_111558.jfr][^MultistageEngineQuickStart_2024_07_12_111558.jfr]
>  
> I've tried to reproduce the problem with `sqline` but I wasn't able to do so. 
> As far as I can see in the code, RelMdUtil.checkInputForCollationAndLimit is 
> only called in SortJoinCopyRule, SortJoinTransposeRule and 
> SortUnionTransposeRule. I've tried to create a test or JMH benchmark in 
> Calcite to try to reproduce the issue, but I don't know codebase well enough.
>  
> I don't consider myself an expert on Apache Calcite and I know we are not 
> using Calcite in the most standard way (we are slowly migrating from our own 
> engine to Calcite), but I'm pretty confident this issue may also affect other 
> Calcite usages. At least in the trace I cannot see anything Pinot specific.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to