[
https://issues.apache.org/jira/browse/IMPALA-14263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Riza Suminto resolved IMPALA-14263.
-----------------------------------
Fix Version/s: Impala 5.0.0
Target Version: Impala 5.0.0
Resolution: Fixed
> Broadcast cost in planner is skewed by the number of nodes comparing to
> partition cost
> --------------------------------------------------------------------------------------
>
> Key: IMPALA-14263
> URL: https://issues.apache.org/jira/browse/IMPALA-14263
> Project: IMPALA
> Issue Type: Improvement
> Components: Frontend
> Reporter: Wenzhe Zhou
> Assignee: Riza Suminto
> Priority: Major
> Fix For: Impala 5.0.0
>
>
> broadCast Cost = dataPayload + hashTblBuildCost = 2 x (rhsDataSize *
> leftChildNodes)
> partition Cost = Math.round(lhsNetworkCost + rhsNetworkCost + rhsDataSize)
> The number of nodes skews broadcast cost on bigger clusters, which makes
> broadcast cost much bigger than partitioned join cost, e.g. planner favor
> partition strategy for big cluster.
> We probably need to introduce new heuristics to join strategy decision, like
> including number of nodes in partitioned join cost model. We also need a way
> to check for the degree of skew on the join key during the planning phase. If
> the skew is on the higher side, we would want to bias the cost model towards
> broadcast.
> Adding join hints in the query is the recommended workaround to force
> broadcast join in the cases where join keys are skewed, especially for larger
> clusters.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]