[
https://issues.apache.org/jira/browse/IMPALA-13587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17933136#comment-17933136
]
ASF subversion and git services commented on IMPALA-13587:
----------------------------------------------------------
Commit 5b4427ed1beacb13522245a30d304aefbb7afb07 in impala's branch
refs/heads/master from Steve Carlin
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=5b4427ed1 ]
IMPALA-13587: Calcite planner: Outer join not aggregating nulls properly
The following query is producing incorrect results:
select t2.int_col y from alltypessmall t1 left outer join
alltypestiny t2 on t1.int_col = t2.int_col group by 1
... due to nulls not being aggregated properly on multiple nodes.
This is because the value equivalency graph is being set for the
join conjunct on an outer join. When a hash join partition node is
being used, there is an optimization that skips the aggregation step
that combines groups across nodes if, based on the value transfer
graph, it deduces that all data for the partition column is being
sent to the same node.
The bug here is that even though an outer join is using an
equi-conjunct, the left and right side are different when data is not
found on the outer join side, where it becomes null.
The fix is to avoid registering the equi-conjunct if the values are
not always equal.
Change-Id: I57e9d4ad4c4af5a4c268e43ac2937064dab6ffd7
Reviewed-on: http://gerrit.cloudera.org:8080/22138
Reviewed-by: Michael Smith <[email protected]>
Reviewed-by: Riza Suminto <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
Reviewed-by: Steve Carlin <[email protected]>
> Calcite planner: outer join not aggregating nulls properly
> ----------------------------------------------------------
>
> Key: IMPALA-13587
> URL: https://issues.apache.org/jira/browse/IMPALA-13587
> Project: IMPALA
> Issue Type: Sub-task
> Reporter: Steve Carlin
> Priority: Major
>
> The following query is producing incorrect results:
> select t2.int_col y from alltypessmall t1 left outer join alltypestiny t2 on
> t1.int_col = t2.int_col group by 1
> ... due to nulls not being aggregated properly on multiple nodes. This is
> because the value equivalency graph is being set for the join conjunct on an
> outer join. When a hash join partition node is being used, there is an
> optimization that skips the aggregation step that combines groups across
> nodes if, based on the value transfer graph, it deduces that all data for the
> partition column is being sent to the same node.
> The bug here is that even though an outer join is using an equi-conjunct, the
> left and right side are different when data is not found on the outer join
> side, where it becomse null.
> The fix is to avoid registering the equi-conjunct if the values are not
> always equal.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]