[
https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807393#comment-16807393
]
Lai Zhou edited comment on CALCITE-2973 at 4/2/19 9:28 AM:
-----------------------------------------------------------
[~julianhyde] , consider another query that the join conditions contains an
equi condition and a non-equi condition meanwhile :
{code:java}
SELECT t1.i_item_desc FROM item t1 LEFT OUTER JOIN item_1 t2 ON
t1.i_item_sk=t2.i_item_sk and t2.i_item_sk <10000{code}
Merge join is also good for this query. But now it will be converted to a
nested loop join.
I have a try to replace the default ENUMERABLE_JOIN_RULE by a customized rule:
{code:java}
final JoinInfo info = JoinInfo.of(left, right, join.getCondition());
if (!info.isEqui() && join.getJoinType() != JoinRelType.INNER) {
// EnumerableJoinRel only supports equi-join. We can put a filter on top
// if it is an inner join.
try {
boolean hasEquiKeys = !info.leftKeys.isEmpty()
&& !info.rightKeys.isEmpty();
if (hasEquiKeys) {
return convertToThetaMergeJoin(rel);
} else {
return new EnumerableThetaJoin(cluster, traitSet, left, right,
join.getCondition(), join.getVariablesSet(), join.getJoinType());
}
} catch (Exception e) {
EnumerableRules.LOGGER.debug(e.toString());
return null;
}
}
{code}
if the join has equi-keys, it will be converted to an EnumerableThetaMergeJoin
.
{code:java}
new EnumerableThetaMergeJoin(cluster, traits, left, right,
info.getEquiCondition(left, right, cluster.getRexBuilder()),
info.getRemaining(cluster.getRexBuilder()), info.leftKeys, info.rightKeys,
join.getVariablesSet(), join.getJoinType());{code}
I implement the EnumerableThetaMergeJoin to handle a theta join with equi keys
.
The key difference of EnumerableThetaMergeJoin and EnumerableMergeJoin is
that:
EnumerableThetaMergeJoin use a predicate generated by the remaining part of the
JoinInfo,
and the predicate will be applied on the cartesians result of a merge join.
see
[https://github.com/apache/calcite/blob/27d883983e76691f9294e5edd9e264b978dfa7e9/linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java#L3298]
I do some changes:
{code:java}
public TResult current() {
final List<Object> list = cartesians.current();
@SuppressWarnings("unchecked") final TSource left =
(TSource) list.get(0);
@SuppressWarnings("unchecked") final TInner right =
(TInner) list.get(1);
//apply predicate for the result in cartesians
boolean isNonEquiPredicateSatisfied=predicate.apply(left, right);
if (!isNonEquiPredicateSatisfied) {
if (generateNullsOnLeft) {
return resultSelector.apply(null, right);
}
if (generateNullsOnRight) {
return resultSelector.apply(left, null);
}
}
return resultSelector.apply(left, right);
}
{code}
was (Author: hhlai1990):
[~julianhyde] , consider another query that the join conditions contains an
equi condition and a non-equi condition meanwhile :
{code:java}
SELECT t1.i_item_desc FROM item t1 LEFT OUTER JOIN item_1 t2 ON
t1.i_item_sk=t2.i_item_sk and t2.i_item_sk <10000{code}
Merge join is also good for this query. But now it will be converted to a
nested loop join.
I have a try to replace the default ENUMERABLE_JOIN_RULE by a customized rule:
{code:java}
final JoinInfo info = JoinInfo.of(left, right, join.getCondition());
if (!info.isEqui() && join.getJoinType() != JoinRelType.INNER) {
// EnumerableJoinRel only supports equi-join. We can put a filter on top
// if it is an inner join.
try {
boolean hasEquiKeys = !info.leftKeys.isEmpty()
&& !info.rightKeys.isEmpty();
if (hasEquiKeys) {
return convertToThetaMergeJoin(rel);
} else {
return new EnumerableThetaJoin(cluster, traitSet, left, right,
join.getCondition(), join.getVariablesSet(), join.getJoinType());
}
} catch (Exception e) {
EnumerableRules.LOGGER.debug(e.toString());
return null;
}
}
{code}
if the join has equi-keys, it will be converted an EnumerableThetaMergeJoin .
{code:java}
new EnumerableThetaMergeJoin(cluster, traits, left, right,
info.getEquiCondition(left, right, cluster.getRexBuilder()),
info.getRemaining(cluster.getRexBuilder()), info.leftKeys, info.rightKeys,
join.getVariablesSet(), join.getJoinType());{code}
I implement the EnumerableThetaMergeJoin to handle a theta join with equi keys
.
The key difference of EnumerableThetaMergeJoin and EnumerableMergeJoin is
that:
EnumerableThetaMergeJoin use a predicate generated by the remaining part of the
JoinInfo,
and the predicate will be applied on the cartesians result of a merge join.
see
[https://github.com/apache/calcite/blob/27d883983e76691f9294e5edd9e264b978dfa7e9/linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java#L3298]
I do some changes:
{code:java}
public TResult current() {
final List<Object> list = cartesians.current();
@SuppressWarnings("unchecked") final TSource left =
(TSource) list.get(0);
@SuppressWarnings("unchecked") final TInner right =
(TInner) list.get(1);
//apply predicate for the result in cartesians
boolean isNonEquiPredicateSatisfied=predicate.apply(left, right);
if (!isNonEquiPredicateSatisfied) {
if (generateNullsOnLeft) {
return resultSelector.apply(null, right);
}
if (generateNullsOnRight) {
return resultSelector.apply(left, null);
}
}
return resultSelector.apply(left, right);
}
{code}
> Make EnumerableMergeJoinRule to support a theta join
> ----------------------------------------------------
>
> Key: CALCITE-2973
> URL: https://issues.apache.org/jira/browse/CALCITE-2973
> Project: Calcite
> Issue Type: New Feature
> Components: core
> Affects Versions: 1.19.0
> Reporter: Lai Zhou
> Priority: Minor
>
> Now the EnumerableMergeJoinRule only supports an inner and equi join.
> If users make a theta-join query for a large dataset (such as 10000*10000),
> the nested-loop join process will take dozens of time than the sort-merge
> join process .
> So if we can apply merge-join or hash-join rule for a theta join, it will
> improve the performance greatly.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)