[jira] [Commented] (CALCITE-2973) Allow theta joins that have equi conditions to be executed using a hash join algorithm

Lai Zhou (JIRA) Wed, 15 May 2019 01:49:25 -0700


    [ 
https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16840190#comment-16840190
 ]


Lai Zhou commented on CALCITE-2973:
-----------------------------------

[~rubenql],thanks , I understand it.

When creating a SemiJoin from a EnumerableJoin, the remainCondition is 
missed.Now it backs to my previous question:

Should we define the EnumerableJoin as an EquiJoin or  a pure Join?, if it's an 
EquiJoin, the condition just contains the equi part.

If we change  the EnumerableJoin to a pure join, it will cause some other 
problems , such as that, the FilterJoinRule can't work.

My initial solution is to introduce a  EnumerableThetaHashJoin to handle the 
non-inner join that contains a remainCondition.

This EnumerableThetaHashJoin is more like a EnumerableThetaJoin, which is a 
Join rather than an EquiJoin,

And EnumerableThetaHashJoin and Enumerable(Hash)Join can share the same hash 
join algorithm .

I think this solution is more clear and will do no harm to current rules.

> Allow theta joins that have equi conditions to be executed using a hash join 
> algorithm
> --------------------------------------------------------------------------------------
>
>                 Key: CALCITE-2973
>                 URL: https://issues.apache.org/jira/browse/CALCITE-2973
>             Project: Calcite
>          Issue Type: New Feature
>          Components: core
>    Affects Versions: 1.19.0
>            Reporter: Lai Zhou
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 1.20.0
>
>          Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Now the EnumerableMergeJoinRule only supports an inner and equi join.
> If users make a theta-join query  for a large dataset (such as 10000*10000), 
> the nested-loop join process will take dozens of time than the sort-merge 
> join process .
> So if we can apply merge-join or hash-join rule for a theta join, it will 
> improve the performance greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CALCITE-2973) Allow theta joins that have equi conditions to be executed using a hash join algorithm

Reply via email to