Xuebin Su has uploaded a new patch set (#7). ( 
http://gerrit.cloudera.org:8080/22662 )

Change subject: IMPALA-3841: Enable late materialization for collections
......................................................................

IMPALA-3841: Enable late materialization for collections

This patch enables late materialization for collections to avoid the
cost of materializing collections that will never be accessed by the
query.

For a collection column, late materialization takes effect only when the
collection column is not used in any predicate, including the `!empty()`
predicate added by the planner. Otherwise we need to read every row to
evaluate the predicate and cannot skip any. Therefore, this patch skips
registering the `!empty()` predicates if the query contains zipping
unnests for late materialization.

The late materialization threshold is set to 1 in HdfsParquetScanner
when there is any collection that can be skipped.

This patch also adds the detail of `HdfsScanner::parse_status_` to the
error message returned by the HdfsParquetScanner to help figure out the
root cause.

Testing:
- Added a runtime profile counter NumRowsSkippedByLateMaterialization
  to record the total number of top-level rows skipped by late
  materialization for all columns. The counter only counts the rows
  that are not skipped as a page.
- Added e2e test cases in test_parquet_late_materialization.py to ensure
  that late materialization works using the new counter.

Change-Id: Ia21bdfa6811408d66d74367e0a9520e20951105f
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/exec/parquet/parquet-collection-column-reader.cc
M be/src/exec/parquet/parquet-column-readers.cc
M be/src/exec/parquet/parquet-column-readers.h
M be/src/exec/parquet/parquet-complex-column-reader.h
M be/src/exec/parquet/parquet-level-decoder.h
M be/src/exec/parquet/parquet-struct-column-reader.cc
M be/src/exec/scratch-tuple-batch.h
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M testdata/workloads/functional-planner/queries/PlannerTest/zipping-unnest.test
M 
testdata/workloads/functional-query/queries/QueryTest/parquet-late-materialization-unique-db.test
M 
testdata/workloads/functional-query/queries/QueryTest/parquet-late-materialization.test
M tests/query_test/test_parquet_late_materialization.py
15 files changed, 178 insertions(+), 33 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/62/22662/7
--
To view, visit http://gerrit.cloudera.org:8080/22662
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia21bdfa6811408d66d74367e0a9520e20951105f
Gerrit-Change-Number: 22662
Gerrit-PatchSet: 7
Gerrit-Owner: Xuebin Su <x...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Xuebin Su <x...@cloudera.com>

Reply via email to