Hello Kurt Deschler, Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/22840

to look at the new patch set (#6).

Change subject: IMPALA-14006: Bound max_instances in 
CreateInputCollocatedInstances
......................................................................

IMPALA-14006: Bound max_instances in CreateInputCollocatedInstances

IMPALA-11604 (part 2) changes how many instances to create in
Scheduler::CreateInputCollocatedInstances. This works when the left
child fragment of a parent fragment is distributed across nodes.
However, if the left child fragment instance is limited to only 1
node (the case of UNPARTITIONED fragment), the scheduler might
over-parallelize the parent fragment by scheduling too many instances in
a single node.

This patch attempts to mitigate the issue in two ways. First, it adds
bounding logic in PlanFragment.traverseEffectiveParallelism() to lower
parallelism further if the left (probe) side of the child fragment is
not well distributed across nodes.

Second, it adds TQueryExecRequest.max_parallelism_per_node to relay
information from Analyzer.getMaxParallelismPerNode() to the scheduler.
With this information, the scheduler can do additional sanity checks to
prevent Scheduler::CreateInputCollocatedInstances from over-parallelize
a fragment. Note that this sanity check can also cap MAX_FS_WRITERS
option under a similar scenario.

Added ScalingVerdict enum and TRACE log it to show the scaling decision
steps.

Testing:
- Add planner test and e2e test that exercise the corner case under
  COMPUTE_PROCESSING_COST=1 option.
- Manually comment the bounding logic in traverseEffectiveParallelism()
  and confirm that the scheduler's sanity check still enforces the
  bounding.

Change-Id: I65223b820c9fd6e4267d57297b1466d4e56829b3
---
M be/src/scheduling/scheduler.cc
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test
A testdata/workloads/tpcds/queries/unpartitioned-probe.test
M tests/query_test/test_tpcds_queries.py
8 files changed, 250 insertions(+), 35 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/22840/6
--
To view, visit http://gerrit.cloudera.org:8080/22840
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I65223b820c9fd6e4267d57297b1466d4e56829b3
Gerrit-Change-Number: 22840
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto <riza.sumi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com>

Reply via email to