[
https://issues.apache.org/jira/browse/IMPALA-14008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fang-Yu Rao updated IMPALA-14008:
---------------------------------
Description:
Currently, when the probe/left side of a hash join is only executed by one
single Impala host, the results of the hash join will only be distributed to
one single host for further processing, e.g., aggregation. The performance
could be improved if multiple Impala hosts could participate in the aggregation
in such a case.
For instance, for the query "{{{}select straight_join
count(functional.tinyinttable.int_col) from functional.tinyinttable,
functional.alltypes where functional.tinyinttable.int_col =
functional.alltypes.id{}}}", we have the following distributed query plan where
there is only one Impala host working on the aggregation, if the probe/left
side of the join is only executed by one Impala host. Note that we added
"{{{}straight_join{}}}" to force Impala's frontend to preserve the join order.
{code:java}
Operator #Hosts #Inst Avg Time Max Time #Rows Est. #Rows
Peak Mem Est. Peak Mem Detail
---------------------------------------------------------------------------------------------------------------------------------
F02:ROOT 1 1 0.000ns 0.000ns
0 0
06:AGGREGATE 1 1 0.000ns 0.000ns 1 1
16.00 KB 16.00 KB FINALIZE
05:EXCHANGE 1 1 0.000ns 0.000ns 1 1
16.00 KB 16.00 KB UNPARTITIONED
F00:EXCHANGE SENDER 1 1 0.000ns 0.000ns
0 48.00 KB
03:AGGREGATE 1 1 0.000ns 0.000ns 1 1
29.00 KB 16.00 KB
02:HASH JOIN 1 1 0.000ns 0.000ns 10 5
1.98 MB 1.94 MB INNER JOIN, BROADCAST
|--04:EXCHANGE 1 1 0.000ns 0.000ns 7.30K 7.30K
336.00 KB 52.52 KB BROADCAST
| F01:EXCHANGE SENDER 3 3 0.000ns 0.000ns
9.62 KB 32.00 KB
| 01:SCAN HDFS 3 3 26.669ms 32.003ms 7.30K 7.30K
324.00 KB 160.00 MB functional.alltypes
00:SCAN HDFS 1 1 4.000ms 4.000ms 10 5
29.00 KB 32.00 MB functional.tinyinttable
{code}
On the other hand, for the same query without the query hint of
"{{{}straight_join{}}}", there will be multiple Impala hosts executing the
probe/left side of the hash join in "{{{}select
count(functional.tinyinttable.int_col) from functional.tinyinttable,
functional.alltypes where functional.tinyinttable.int_col =
functional.alltypes.id{}}}" and there will be multiple Impala hosts working on
the aggregation as shown in the following.
{code:java}
Operator #Hosts #Inst Avg Time Max Time #Rows Est. #Rows
Peak Mem Est. Peak Mem Detail
---------------------------------------------------------------------------------------------------------------------------------
F02:ROOT 1 1 0.000ns 0.000ns
0 0
06:AGGREGATE 1 1 0.000ns 0.000ns 1 1
16.00 KB 16.00 KB FINALIZE
05:EXCHANGE 1 1 0.000ns 0.000ns 3 3
32.00 KB 16.00 KB UNPARTITIONED
F00:EXCHANGE SENDER 3 3 0.000ns 0.000ns
31.00 B 48.00 KB
03:AGGREGATE 3 3 0.000ns 0.000ns 3 3
64.00 KB 16.00 KB
02:HASH JOIN 3 3 1.333ms 4.000ms 10 7.30K
1.98 MB 1.94 MB INNER JOIN, BROADCAST
|--04:EXCHANGE 3 3 0.000ns 0.000ns 10 5
16.00 KB 16.00 KB BROADCAST
| F01:EXCHANGE SENDER 1 1 0.000ns 0.000ns
118.00 B 32.00 KB
| 00:SCAN HDFS 1 1 0.000ns 0.000ns 10 5
29.00 KB 32.00 MB functional.tinyinttable
01:SCAN HDFS 3 3 6.667ms 8.000ms 7.30K 7.30K
342.00 KB 160.00 MB functional.alltypes
{code}
The example above is indeed a bit contrived, but it could be a real issue if
somehow Impala produces a join order according to which there will only be one
single Impala host executing the probe/left side of a hash join. For example,
Impala could produce such a join order if there are many files in the smaller
table (the one with smaller cardinality) of the join and as a result this
smaller table becomes the probe/left side of a hash join.
For easy reference, we can see in
[Planner.java|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/Planner.java],
the number of hosts executing the hash join depends on the number of hosts
executing the left child, i.e., the probe side child.
{code:java}
public static void invertJoins(PlanNode root, boolean isLocalPlan) {
...
if (root instanceof JoinNode) {
// Re-compute the numNodes and numInstances based on the new input order
joinNode.recomputeNodes();
}
}
{code}
Recall that {{recomputeNodes()}} is defined in
[JoinNode.java|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/JoinNode.java].
{code}
/**
* Reset the numNodes_ and numInstances_ based on the left child
*/
public void recomputeNodes() {
numNodes_ = getChild(0).numNodes_;
numInstances_ = getChild(0).numInstances_;
}
{code}
was:
Currently, when the probe/left side of a hash join is only executed by one
single Impala host, the results of the hash join will only be distributed to
one single host for further processing, e.g., aggregation. The performance
could be improved if multiple Impala hosts could participate in the aggregation
in such a case.
For instance, for the query "{{{}select straight_join
count(functional.tinyinttable.int_col) from functional.tinyinttable,
functional.alltypes where functional.tinyinttable.int_col =
functional.alltypes.id{}}}", we have the following distributed query plan where
there is only one Impala host working on the aggregation, if the probe/left
side of the join is only executed by one Impala host. Note that we added
"{{{}straight_join{}}}" to force Impala's frontend to preserve the join order.
{code:java}
Operator #Hosts #Inst Avg Time Max Time #Rows Est. #Rows
Peak Mem Est. Peak Mem Detail
---------------------------------------------------------------------------------------------------------------------------------
F02:ROOT 1 1 0.000ns 0.000ns
0 0
06:AGGREGATE 1 1 0.000ns 0.000ns 1 1
16.00 KB 16.00 KB FINALIZE
05:EXCHANGE 1 1 0.000ns 0.000ns 1 1
16.00 KB 16.00 KB UNPARTITIONED
F00:EXCHANGE SENDER 1 1 0.000ns 0.000ns
0 48.00 KB
03:AGGREGATE 1 1 0.000ns 0.000ns 1 1
29.00 KB 16.00 KB
02:HASH JOIN 1 1 0.000ns 0.000ns 10 5
1.98 MB 1.94 MB INNER JOIN, BROADCAST
|--04:EXCHANGE 1 1 0.000ns 0.000ns 7.30K 7.30K
336.00 KB 52.52 KB BROADCAST
| F01:EXCHANGE SENDER 3 3 0.000ns 0.000ns
9.62 KB 32.00 KB
| 01:SCAN HDFS 3 3 26.669ms 32.003ms 7.30K 7.30K
324.00 KB 160.00 MB functional.alltypes
00:SCAN HDFS 1 1 4.000ms 4.000ms 10 5
29.00 KB 32.00 MB functional.tinyinttable
{code}
On the other hand, for the same query without the query hint of
"{{{}straight_join{}}}", there will be multiple Impala hosts executing the
probe/left side of the hash join in "{{{}select
count(functional.tinyinttable.int_col) from functional.tinyinttable,
functional.alltypes where functional.tinyinttable.int_col =
functional.alltypes.id{}}}" and there will be multiple Impala hosts working on
the aggregation as shown in the following.
{code:java}
Operator #Hosts #Inst Avg Time Max Time #Rows Est. #Rows
Peak Mem Est. Peak Mem Detail
---------------------------------------------------------------------------------------------------------------------------------
F02:ROOT 1 1 0.000ns 0.000ns
0 0
06:AGGREGATE 1 1 0.000ns 0.000ns 1 1
16.00 KB 16.00 KB FINALIZE
05:EXCHANGE 1 1 0.000ns 0.000ns 3 3
32.00 KB 16.00 KB UNPARTITIONED
F00:EXCHANGE SENDER 3 3 0.000ns 0.000ns
31.00 B 48.00 KB
03:AGGREGATE 3 3 0.000ns 0.000ns 3 3
64.00 KB 16.00 KB
02:HASH JOIN 3 3 1.333ms 4.000ms 10 7.30K
1.98 MB 1.94 MB INNER JOIN, BROADCAST
|--04:EXCHANGE 3 3 0.000ns 0.000ns 10 5
16.00 KB 16.00 KB BROADCAST
| F01:EXCHANGE SENDER 1 1 0.000ns 0.000ns
118.00 B 32.00 KB
| 00:SCAN HDFS 1 1 0.000ns 0.000ns 10 5
29.00 KB 32.00 MB functional.tinyinttable
01:SCAN HDFS 3 3 6.667ms 8.000ms 7.30K 7.30K
342.00 KB 160.00 MB functional.alltypes
{code}
> Consider distributing the results of hash join to multiple Impala hosts when
> only one single Impala host is executing the probe side
> ------------------------------------------------------------------------------------------------------------------------------------
>
> Key: IMPALA-14008
> URL: https://issues.apache.org/jira/browse/IMPALA-14008
> Project: IMPALA
> Issue Type: Improvement
> Reporter: Fang-Yu Rao
> Priority: Major
>
> Currently, when the probe/left side of a hash join is only executed by one
> single Impala host, the results of the hash join will only be distributed to
> one single host for further processing, e.g., aggregation. The performance
> could be improved if multiple Impala hosts could participate in the
> aggregation in such a case.
>
> For instance, for the query "{{{}select straight_join
> count(functional.tinyinttable.int_col) from functional.tinyinttable,
> functional.alltypes where functional.tinyinttable.int_col =
> functional.alltypes.id{}}}", we have the following distributed query plan
> where there is only one Impala host working on the aggregation, if the
> probe/left side of the join is only executed by one Impala host. Note that we
> added "{{{}straight_join{}}}" to force Impala's frontend to preserve the join
> order.
> {code:java}
> Operator #Hosts #Inst Avg Time Max Time #Rows Est. #Rows
> Peak Mem Est. Peak Mem Detail
> ---------------------------------------------------------------------------------------------------------------------------------
> F02:ROOT 1 1 0.000ns 0.000ns
> 0 0
> 06:AGGREGATE 1 1 0.000ns 0.000ns 1 1
> 16.00 KB 16.00 KB FINALIZE
> 05:EXCHANGE 1 1 0.000ns 0.000ns 1 1
> 16.00 KB 16.00 KB UNPARTITIONED
> F00:EXCHANGE SENDER 1 1 0.000ns 0.000ns
> 0 48.00 KB
> 03:AGGREGATE 1 1 0.000ns 0.000ns 1 1
> 29.00 KB 16.00 KB
> 02:HASH JOIN 1 1 0.000ns 0.000ns 10 5
> 1.98 MB 1.94 MB INNER JOIN, BROADCAST
> |--04:EXCHANGE 1 1 0.000ns 0.000ns 7.30K 7.30K
> 336.00 KB 52.52 KB BROADCAST
> | F01:EXCHANGE SENDER 3 3 0.000ns 0.000ns
> 9.62 KB 32.00 KB
> | 01:SCAN HDFS 3 3 26.669ms 32.003ms 7.30K 7.30K
> 324.00 KB 160.00 MB functional.alltypes
> 00:SCAN HDFS 1 1 4.000ms 4.000ms 10 5
> 29.00 KB 32.00 MB functional.tinyinttable
> {code}
>
> On the other hand, for the same query without the query hint of
> "{{{}straight_join{}}}", there will be multiple Impala hosts executing the
> probe/left side of the hash join in "{{{}select
> count(functional.tinyinttable.int_col) from functional.tinyinttable,
> functional.alltypes where functional.tinyinttable.int_col =
> functional.alltypes.id{}}}" and there will be multiple Impala hosts working
> on the aggregation as shown in the following.
> {code:java}
> Operator #Hosts #Inst Avg Time Max Time #Rows Est. #Rows
> Peak Mem Est. Peak Mem Detail
> ---------------------------------------------------------------------------------------------------------------------------------
> F02:ROOT 1 1 0.000ns 0.000ns
> 0 0
> 06:AGGREGATE 1 1 0.000ns 0.000ns 1 1
> 16.00 KB 16.00 KB FINALIZE
> 05:EXCHANGE 1 1 0.000ns 0.000ns 3 3
> 32.00 KB 16.00 KB UNPARTITIONED
> F00:EXCHANGE SENDER 3 3 0.000ns 0.000ns
> 31.00 B 48.00 KB
> 03:AGGREGATE 3 3 0.000ns 0.000ns 3 3
> 64.00 KB 16.00 KB
> 02:HASH JOIN 3 3 1.333ms 4.000ms 10 7.30K
> 1.98 MB 1.94 MB INNER JOIN, BROADCAST
> |--04:EXCHANGE 3 3 0.000ns 0.000ns 10 5
> 16.00 KB 16.00 KB BROADCAST
> | F01:EXCHANGE SENDER 1 1 0.000ns 0.000ns
> 118.00 B 32.00 KB
> | 00:SCAN HDFS 1 1 0.000ns 0.000ns 10 5
> 29.00 KB 32.00 MB functional.tinyinttable
> 01:SCAN HDFS 3 3 6.667ms 8.000ms 7.30K 7.30K
> 342.00 KB 160.00 MB functional.alltypes
> {code}
> The example above is indeed a bit contrived, but it could be a real issue if
> somehow Impala produces a join order according to which there will only be
> one single Impala host executing the probe/left side of a hash join. For
> example, Impala could produce such a join order if there are many files in
> the smaller table (the one with smaller cardinality) of the join and as a
> result this smaller table becomes the probe/left side of a hash join.
>
> For easy reference, we can see in
> [Planner.java|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/Planner.java],
> the number of hosts executing the hash join depends on the number of hosts
> executing the left child, i.e., the probe side child.
> {code:java}
> public static void invertJoins(PlanNode root, boolean isLocalPlan) {
> ...
> if (root instanceof JoinNode) {
> // Re-compute the numNodes and numInstances based on the new input order
> joinNode.recomputeNodes();
> }
> }
> {code}
> Recall that {{recomputeNodes()}} is defined in
> [JoinNode.java|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/JoinNode.java].
> {code}
> /**
> * Reset the numNodes_ and numInstances_ based on the left child
> */
> public void recomputeNodes() {
> numNodes_ = getChild(0).numNodes_;
> numInstances_ = getChild(0).numInstances_;
> }
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]