[
https://issues.apache.org/jira/browse/CALCITE-7232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18030566#comment-18030566
]
Stamatis Zampetakis commented on CALCITE-7232:
----------------------------------------------
In this ticket, I don't want to put too much emphasis on how/where to expand
search. I have various ideas on the topic but its not my primary goal so I
would defer this discussion in a follow-up.
The main point that I would like to tackle here is about the IN (possibly
BETWEEN) and other operators being present in the RexNode tree. While, I fully
understand the benefits of having a restricted algebra I wouldn't ban any
standard operator from appearing in the rules. Projects can define their own
customized operators (inheriting from SqlOperator interface) and build rules
upon them so it's rather impossible to pick some of them and rule them out. In
fact, even now that SqlKind.IN is banned from RexCall people will find ways to
hack around it like we did in Hive (by using a custom
[HiveIn|https://github.com/apache/hive/blob/55d9ab7d6b00fa510be791b9de55974f61c90519/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveIn.java#L46]
operator). Moreover, IN, BETWEEN, etc., can be introduced via rules (e.g,
[HiveSearchRules|https://github.com/apache/hive/blob/55d9ab7d6b00fa510be791b9de55974f61c90519/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveSearchRules.java#L32])
just for transformation (logical to physical) purposes.
It's true that if we leave IN/BETWEEN unrestricted, people may start logging
bugs/improvements around these in rules and other places but my personal
opinion is that we should review the requests on a per case basis and decide
then. At this point in time, I see a real need for having IN/BETWEEN in the
plans and forbidding those leads to hacky/customized/non-optimal solutions in
various projects.
> Restore use of IN operator in RexCall
> -------------------------------------
>
> Key: CALCITE-7232
> URL: https://issues.apache.org/jira/browse/CALCITE-7232
> Project: Calcite
> Issue Type: Task
> Reporter: Stamatis Zampetakis
> Priority: Major
>
> The use of {{IN}} operator in {{RexCall}} was superseded by the introduction
> of the {{SEARCH}} operator (CALCITE-4173) and its use is strictly forbidden
> through
> [assertions|https://github.com/apache/calcite/blob/6cbbf560b721cb88354c33751aa72b16a58ded23/core/src/main/java/org/apache/calcite/rex/RexCall.java#L94].
> The {{SEARCH}} operator is more general and powerful than {{IN}} so it's a
> perfect abstraction to use during the optimization phase.
> However, most databases don't have a {{SEARCH}} operator so the latter needs
> to be transformed back to {{IN}} (or something else) at some point in time.
> For instance, Apache Hive has two ways of generating an executable plan:
> * take a {{RelNode}} and generate an AST tree
> * take a {{RelNode}} and generate a Hive Operator tree
> both of which are eventually going to be executed.
> *If we don't allow* IN in a RexCall, then it means that we need to create
> special code to handle SEARCH in both code paths that differ only slightly in
> each case. (In reality the situation is more complicated for Hive because
> there are at least two more places where we need to do a SEARCH to IN
> transformation).
> *If we allow IN* in a RexCall, then at the end of the RelNode optimization
> phase we can "expand" {{SEARCH}} to {{IN}} so the transformation logic only
> appears in one place and it remains a {{RelNode}} to {{RelNode}} conversion.
> In fact, the same transformation logic could be exploited in
> [SqlImplementor|https://github.com/apache/calcite/blob/6cbbf560b721cb88354c33751aa72b16a58ded23/core/src/main/java/org/apache/calcite/rel/rel2sql/SqlImplementor.java#L815]
> that does another {{RelNode}} to "something" conversion.
> The obvious downside with this proposal is that if people start mixing the IN
> operator in various optimization rules/phases it can certainly affect the
> quality of the plans and the planning time.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)