[jira] [Comment Edited] (CALCITE-7232) Restore use of IN operator in RexCall

Steve Carlin (Jira) Sun, 19 Oct 2025 08:50:05 -0700


    [ 
https://issues.apache.org/jira/browse/CALCITE-7232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18030965#comment-18030965
 ]


Steve Carlin edited comment on CALCITE-7232 at 10/19/25 3:49 PM:
-----------------------------------------------------------------

Another idea I'm gonna throw out there.

[~julianhyde]  I"m not sure what you mean by "planning phase" and how much that 
covers.  But if we are specifically talking about rules, perhaps the burden 
should be placed on the rule (or wherever it is implemented, as [~zabetak] 
said).

The rule can easily do this by calling RexSimplify.simplify() and supporting 
the IN operator simplification to SEARCH.

One concern I would have with this is that the simplify() method has a cost. In 
general, I'm very wary about introducing non-final members into classes, but in 
this case, I think it is a good exception: I would suggest creating a "boolean 
isSimplified" variable and a "void setSimplified()" method in the RexNode class 
(as well as a way to set it in the constructor).  When simplification happens, 
it can check to see if it is already simplified before attempting 
simplification.   This variable would also have the nice benefit of allowing 
callers to avoid internal simplification and the creation of the SEARCH 
operator.  I would also not allow any way to unset the isSimplified variable 
once it is marked as true.

Edit: If we are strongly against the idea of mutable variables, I think it 
would be ok to just have it only set by a constructor.  And then perhaps the 
RexNode creators within "simplify()" would set this to true.  But this might 
result in an explosion of "RexBuilder.make*" calls, or limit the ability for 
callers to avoid internal simplification on certain RexNodes.


was (Author: scarlin):
Another idea I'm gonna throw out there.

[~julianhyde]  I"m not sure what you mean by "planning phase" and how much that 
covers.  But if we are specifically talking about rules, perhaps the burden 
should be placed on the rule (or wherever it is implemented, as [~zabetak] 
said).

The rule can easily do this by calling RexSimplify.simplify() and supporting 
the IN operator simplification to SEARCH.

One concern I would have with this is that the simplify() method has a cost. In 
general, I'm very wary about introducing non-final members into classes, but in 
this case, I think it is a good exception: I would suggest creating a "boolean 
isSimplified" variable and a "void setSimplified()" method in the RexNode class 
(as well as a way to set it in the constructor).  When simplification happens, 
it can check to see if it is already simplified before attempting 
simplification.   This variable would also have the nice benefit of allowing 
callers to avoid internal simplification and the creation of the SEARCH 
operator.  I would also not allow any way to unset the isSimplified variable 
once it is marked as true.

> Restore use of IN operator in RexCall
> -------------------------------------
>
>                 Key: CALCITE-7232
>                 URL: https://issues.apache.org/jira/browse/CALCITE-7232
>             Project: Calcite
>          Issue Type: Task
>            Reporter: Stamatis Zampetakis
>            Priority: Major
>
> The use of {{IN}} operator in {{RexCall}} was superseded by the introduction 
> of the {{SEARCH}} operator (CALCITE-4173) and its use is strictly forbidden 
> through 
> [assertions|https://github.com/apache/calcite/blob/6cbbf560b721cb88354c33751aa72b16a58ded23/core/src/main/java/org/apache/calcite/rex/RexCall.java#L94].
>  The {{SEARCH}} operator is more general and powerful than {{IN}} so it's a 
> perfect abstraction to use during the optimization phase.
> However, most databases don't have a {{SEARCH}} operator so the latter needs 
> to be transformed back to {{IN}} (or something else) at some point in time. 
> For instance, Apache Hive has two ways of generating an executable plan:
>  * take a {{RelNode}} and generate an AST tree
>  * take a {{RelNode}} and generate a Hive Operator tree
> both of which are eventually going to be executed.
> *If we don't allow* IN in a RexCall, then it means that we need to create 
> special code to handle SEARCH in both code paths that differ only slightly in 
> each case. (In reality the situation is more complicated for Hive because 
> there are at least two more places where we need to do a SEARCH to IN 
> transformation).
> *If we allow IN* in a RexCall, then at the end of the RelNode optimization 
> phase we can "expand" {{SEARCH}} to {{IN}} so the transformation logic only 
> appears in one place and it remains a {{RelNode}} to {{RelNode}} conversion. 
> In fact, the same transformation logic could be exploited in 
> [SqlImplementor|https://github.com/apache/calcite/blob/6cbbf560b721cb88354c33751aa72b16a58ded23/core/src/main/java/org/apache/calcite/rel/rel2sql/SqlImplementor.java#L815]
>  that does another {{RelNode}} to "something" conversion.
> The obvious downside with this proposal is that if people start mixing the IN 
> operator in various optimization rules/phases it can certainly affect the 
> quality of the plans and the planning time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (CALCITE-7232) Restore use of IN operator in RexCall

Reply via email to