Hello Riza Suminto, Peter Rozsa, Impala Public Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/23705
to look at the new patch set (#6).
Change subject: IMPALA-13756: Fix Iceberg V2 count(*) optimization for complex
queries
......................................................................
IMPALA-13756: Fix Iceberg V2 count(*) optimization for complex queries
We optimize plain count(*) queries on Iceberg tables the following way:
AGGREGATE
COUNT(*)
|
UNION ALL
/ \
/ \
/ \
SCAN all ANTI JOIN
datafiles / \
without / \
deletes SCAN SCAN
datafiles deletes
||
rewrite
||
\/
ArithmethicExpr: LHS + RHS
/ \
/ \
/ \
record_count AGGREGATE
of all COUNT(*)
datafiles |
without ANTI JOIN
deletes / \
/ \
SCAN SCAN
datafiles deletes
This optimization consists of two parts:
1 Rewriting count(*) expression to count(*) + "record_count" (of data
files without deletes)
2 In IcebergScanPlanner we only need to consruct the right side of
the original UNION ALL operator, i.e.:
ANTI JOIN
/ \
/ \
SCAN SCAN
datafiles deletes
SelectStmt decides whether we can do the count(*) optimization, and if
so, does the following:
1: SelectStmt sets 'TotalRecordsNumV2' in the analyzer, then during the
expression rewrite phase the CountStarToConstRule rewrites the
count(*) to count(*) + record_count
2: SelectStmt sets "OptimizeCountStarForIcebergV2" in the query context
then IcebergScanPlanner creates plan accordingly.
This mechanism works for simple queries, but can turn on count(*)
optimization in IcebergScanPlanner for all Iceberg V2 tables in complex
queries. Even if only one subquery enables count(*) optimization during
analysis.
With this patch the followings change:
1: We introduce IcebergV2CountStarAccumulator which we use instead of
the ArithmethicExpr. So after rewrite we still know if count(*)
optimization should be enabled for the planner.
2: Instead of using the query context, we pass the information to the
IcebergScanPlanner via the TableRef object.
Testing
* e2e tests
Change-Id: I1940031298eb634aa82c3d32bbbf16bce8eaf874
---
M common/thrift/Query.thrift
A fe/src/main/java/org/apache/impala/analysis/IcebergV2CountStarAccumulator.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableRef.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/rewrite/CountStarToConstRule.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
A
testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-count-star-optimization-in-complex-query.test
M tests/query_test/test_iceberg.py
9 files changed, 197 insertions(+), 12 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/23705/6
--
To view, visit http://gerrit.cloudera.org:8080/23705
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1940031298eb634aa82c3d32bbbf16bce8eaf874
Gerrit-Change-Number: 23705
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Peter Rozsa <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>