[
https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16840140#comment-16840140
]
Ruben Quesada Lopez commented on CALCITE-2973:
----------------------------------------------
The problem seems to be in {{SemiJoinRule.java}}, which "creates a SemiJoin
from a Join on top of an Aggregate". The problem here, is that the PR is
generating a Join with condition=true and remainCondition=>($2, $0) , but the
SemiJoinRule (and I would tend to say, any existing rule involving a Join as
operator) is not aware of this new "remainCondition" attribute, so it just
takes the join condition (i.e. 'true') to create the SemiJoin (and the
remainCondition is lost).
This specific issue might be solved if {{Join#analyzeCondition}} (and maybe
subsequently {{JoinInfo#of}}) methods are modified: with this change, any join
having a non-null (and non-always-true) remainCondition must have a
NonEquiJoinInfo as a result.
In any case, looking into this case, I'm starting to have some doubts about the
solution proposed in this PR, because it can potentially break any rule
involving a Join, because from now on, such rules (and the potential new ones
to be created) would have to consider the remainCondition predicate when
processing their operators and generating their output, and I fear it could be
something that can be easily missed.
> Allow theta joins that have equi conditions to be executed using a hash join
> algorithm
> --------------------------------------------------------------------------------------
>
> Key: CALCITE-2973
> URL: https://issues.apache.org/jira/browse/CALCITE-2973
> Project: Calcite
> Issue Type: New Feature
> Components: core
> Affects Versions: 1.19.0
> Reporter: Lai Zhou
> Priority: Minor
> Labels: pull-request-available
> Fix For: 1.20.0
>
> Time Spent: 3h 50m
> Remaining Estimate: 0h
>
> Now the EnumerableMergeJoinRule only supports an inner and equi join.
> If users make a theta-join query for a large dataset (such as 10000*10000),
> the nested-loop join process will take dozens of time than the sort-merge
> join process .
> So if we can apply merge-join or hash-join rule for a theta join, it will
> improve the performance greatly.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)