Hello Joe McDonnell, Steve Carlin, Impala Public Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/22094
to look at the new patch set (#13).
Change subject: IMPALA-13533: Calcite CTE backend
......................................................................
IMPALA-13533: Calcite CTE backend
Adds support for CTEs in distributed planning. CTEs are structured like
an exchange, where one CTE fragment can feed multiple CTE exchanges.
Creates a LocalMultiSink as the sink of a CTE fragment, and Sequence
nodes are discarded in the distributed plan.
The multi-cast nature of CTEs creates a directed acyclic graph in our
fragment structure that Impala has not previously dealt with. A guard is
added to avoid scheduling the same fragment multiple times. Interior
fragments are also updated to schedule nodes base don the maximal
fragment input rather than the first, to ensure CTE scans are present
for all CTE buffers.
Implements backend for CTEs in the Calcite planner. CTE output is added
to a LocalExchanger, then pulled concurrently. LocalExchangers are
registered with QueryState so all fragments can access them;
registration is done during plan fragment construction so all instances
can find the LocalExchanger or identify it's absence. MT_DOP still needs
to be addressed, likely by constructing num_instances_per_node
LocalExchangers and providing each fragment instance a lookup index.
Mimics UnionNode's MaterializeBatch for translating the CTE tuple to the
expected output tuple, with passthrough for cases where input and output
row layouts match.
Changes cte_threshold default to 1, so CTEs are enabled by default and
used when the cost-based planner identifies they would be helpful and at
least 2 instances of the CTE are detected.
Adds cte-distributed showing distributed planning and basic execution.
Tested with TPC-DS queries (DecimalV2 version). Setup
start-impala-cluster.py --env_vars=USE_CALCITE_PLANNER=true \
--impalad_args=--default_query_options=use_calcite_planner=true
impala-py.test \
tests/query_test/test_tpcds_queries.py::TestTpcdsDecimalV2Query
Change-Id: I48f16d495d4b37be97e6a913f0eb5b94d70e199a
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/exec/CMakeLists.txt
A be/src/exec/cte-buffer-node.cc
A be/src/exec/cte-buffer-node.h
A be/src/exec/cte-scan-node-ir.cc
A be/src/exec/cte-scan-node.cc
A be/src/exec/cte-scan-node.h
M be/src/exec/data-sink.cc
M be/src/exec/exec-node.cc
A be/src/exec/local-multi-sink.cc
A be/src/exec/local-multi-sink.h
A be/src/exec/sequence-node.cc
A be/src/exec/sequence-node.h
M be/src/runtime/CMakeLists.txt
M be/src/runtime/coordinator.cc
M be/src/runtime/descriptors.cc
A be/src/runtime/local-exchanger.cc
A be/src/runtime/local-exchanger.h
M be/src/runtime/query-state.h
M be/src/scheduling/schedule-state.cc
M be/src/scheduling/schedule-state.h
M be/src/scheduling/scheduler.cc
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
A fe/src/main/java/org/apache/impala/planner/LocalMultiSink.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteOptimizer.java
A testdata/workloads/functional-query/queries/QueryTest/cte-distributed.test
M tests/common/test_result_verifier.py
M tests/custom_cluster/test_calcite_planner.py
34 files changed, 1,361 insertions(+), 27 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/22094/13
--
To view, visit http://gerrit.cloudera.org:8080/22094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I48f16d495d4b37be97e6a913f0eb5b94d70e199a
Gerrit-Change-Number: 22094
Gerrit-PatchSet: 13
Gerrit-Owner: Michael Smith <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Steve Carlin <[email protected]>