[ 
https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16840140#comment-16840140
 ] 

Ruben Quesada Lopez commented on CALCITE-2973:
----------------------------------------------

The problem seems to be in {{SemiJoinRule.java}}, which "creates a SemiJoin 
from a Join on top of an Aggregate". The problem here, is that the PR is 
generating a Join with condition=true and remainCondition=>($2, $0) , but the 
SemiJoinRule (and I would tend to say, any existing rule involving a Join as 
operator) is not aware of this new "remainCondition" attribute, so it just 
takes the join condition (i.e. 'true') to create the SemiJoin (and the 
remainCondition is lost).
This specific issue might be solved if {{Join#analyzeCondition}} (and maybe 
subsequently {{JoinInfo#of}}) methods are modified: with this change, any join 
having a non-null (and non-always-true) remainCondition must have a 
NonEquiJoinInfo as a result.
In any case, looking into this case, I'm starting to have some doubts about the 
solution proposed in this PR, because it can potentially break any rule 
involving a Join, because from now on, such rules (and the potential new ones 
to be created) would have to consider the remainCondition predicate when 
processing their operators and generating their output, and I fear it could be 
something that can be easily missed.

> Allow theta joins that have equi conditions to be executed using a hash join 
> algorithm
> --------------------------------------------------------------------------------------
>
>                 Key: CALCITE-2973
>                 URL: https://issues.apache.org/jira/browse/CALCITE-2973
>             Project: Calcite
>          Issue Type: New Feature
>          Components: core
>    Affects Versions: 1.19.0
>            Reporter: Lai Zhou
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 1.20.0
>
>          Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Now the EnumerableMergeJoinRule only supports an inner and equi join.
> If users make a theta-join query  for a large dataset (such as 10000*10000), 
> the nested-loop join process will take dozens of time than the sort-merge 
> join process .
> So if we can apply merge-join or hash-join rule for a theta join, it will 
> improve the performance greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to