[
https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16840190#comment-16840190
]
Lai Zhou commented on CALCITE-2973:
-----------------------------------
[~rubenql],thanks , I understand it.
When creating a SemiJoin from a EnumerableJoin, the remainCondition is
missed.Now it backs to my previous question:
Should we define the EnumerableJoin as an EquiJoin or a pure Join?, if it's an
EquiJoin, the condition just contains the equi part.
If we change the EnumerableJoin to a pure join, it will cause some other
problems , such as that, the FilterJoinRule can't work.
My initial solution is to introduce a EnumerableThetaHashJoin to handle the
non-inner join that contains a remainCondition.
This EnumerableThetaHashJoin is more like a EnumerableThetaJoin, which is a
Join rather than an EquiJoin,
And EnumerableThetaHashJoin and Enumerable(Hash)Join can share the same hash
join algorithm .
I think this solution is more clear and will do no harm to current rules.
> Allow theta joins that have equi conditions to be executed using a hash join
> algorithm
> --------------------------------------------------------------------------------------
>
> Key: CALCITE-2973
> URL: https://issues.apache.org/jira/browse/CALCITE-2973
> Project: Calcite
> Issue Type: New Feature
> Components: core
> Affects Versions: 1.19.0
> Reporter: Lai Zhou
> Priority: Minor
> Labels: pull-request-available
> Fix For: 1.20.0
>
> Time Spent: 3h 50m
> Remaining Estimate: 0h
>
> Now the EnumerableMergeJoinRule only supports an inner and equi join.
> If users make a theta-join query for a large dataset (such as 10000*10000),
> the nested-loop join process will take dozens of time than the sort-merge
> join process .
> So if we can apply merge-join or hash-join rule for a theta join, it will
> improve the performance greatly.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)