[
https://issues.apache.org/jira/browse/IMPALA-14263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18012232#comment-18012232
]
Riza Suminto commented on IMPALA-14263:
---------------------------------------
Having query option to tweak broadcast cost will be beneficial. Admin can set
default option value cluster-wide to counter the cost scaling from num
executors in cluster when supplying query hint is not feasible.
Filed patch for this: https://gerrit.cloudera.org/c/23258/
> Broadcast cost in planner is skewed by the number of nodes comparing to
> partition cost
> --------------------------------------------------------------------------------------
>
> Key: IMPALA-14263
> URL: https://issues.apache.org/jira/browse/IMPALA-14263
> Project: IMPALA
> Issue Type: Improvement
> Components: Frontend
> Reporter: Wenzhe Zhou
> Assignee: Riza Suminto
> Priority: Major
>
> broadCast Cost = dataPayload + hashTblBuildCost = 2 x (rhsDataSize *
> leftChildNodes)
> partition Cost = Math.round(lhsNetworkCost + rhsNetworkCost + rhsDataSize)
> The number of nodes skews broadcast cost on bigger clusters, which makes
> broadcast cost much bigger than partitioned join cost, e.g. planner favor
> partition strategy for big cluster.
> We probably need to introduce new heuristics to join strategy decision, like
> including number of nodes in partitioned join cost model. We also need a way
> to check for the degree of skew on the join key during the planning phase. If
> the skew is on the higher side, we would want to bias the cost model towards
> broadcast.
> Adding join hints in the query is the recommended workaround to force
> broadcast join in the cases where join keys are skewed, especially for larger
> clusters.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]