[ 
https://issues.apache.org/jira/browse/IMPALA-11604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17950366#comment-17950366
 ] 

ASF subversion and git services commented on IMPALA-11604:
----------------------------------------------------------

Commit 3210ec58c5ff7e3633afa7f596ed7d517ec8d0d9 in impala's branch 
refs/heads/master from Riza Suminto
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=3210ec58c ]

IMPALA-14006: Bound max_instances in CreateInputCollocatedInstances

IMPALA-11604 (part 2) changes how many instances to create in
Scheduler::CreateInputCollocatedInstances. This works when the left
child fragment of a parent fragment is distributed across nodes.
However, if the left child fragment instance is limited to only 1
node (the case of UNPARTITIONED fragment), the scheduler might
over-parallelize the parent fragment by scheduling too many instances in
a single node.

This patch attempts to mitigate the issue in two ways. First, it adds
bounding logic in PlanFragment.traverseEffectiveParallelism() to lower
parallelism further if the left (probe) side of the child fragment is
not well distributed across nodes.

Second, it adds TQueryExecRequest.max_parallelism_per_node to relay
information from Analyzer.getMaxParallelismPerNode() to the scheduler.
With this information, the scheduler can do additional sanity checks to
prevent Scheduler::CreateInputCollocatedInstances from
over-parallelizing a fragment. Note that this sanity check can also cap
MAX_FS_WRITERS option under a similar scenario.

Added ScalingVerdict enum and TRACE log it to show the scaling decision
steps.

Testing:
- Add planner test and e2e test that exercise the corner case under
  COMPUTE_PROCESSING_COST=1 option.
- Manually comment the bounding logic in traverseEffectiveParallelism()
  and confirm that the scheduler's sanity check still enforces the
  bounding.

Change-Id: I65223b820c9fd6e4267d57297b1466d4e56829b3
Reviewed-on: http://gerrit.cloudera.org:8080/22840
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Planner changes for CPU usage
> -----------------------------
>
>                 Key: IMPALA-11604
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11604
>             Project: IMPALA
>          Issue Type: Improvement
>            Reporter: Qifan Chen
>            Assignee: Riza Suminto
>            Priority: Critical
>             Fix For: Impala 4.3.0
>
>
> Plan scaling based on estimated peak memory has been enabled in 
> IMPALA-10992. However, it is sometime desirable to consider CPU-usage (such 
> as the number of data processed) as a scaling factor. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to