Xuebin Su has uploaded a new patch set (#5). ( 
http://gerrit.cloudera.org:8080/22662 )

Change subject: IMPALA-3841: Enable late materialization for collections
......................................................................

IMPALA-3841: Enable late materialization for collections

This patch enables late materialization for collections to avoid the
cost of materializing collections that will never be accessed by the
query.

For a collection column, late materialization takes effect only when the
collection column is not used in any predicate, including the `!empty()`
predicate added by the planner. Otherwise we need to read every row to
evaluate the predicate and cannot skip any. Therefore, this patch skips
registering the `!empty()` predicates if the query contains zipping
unnests for late materialization.

This patch also adds the detail of `HdfsScanner::parse_status_` to the
error message returned by the HdfsParquetScanner to help figure out the
root cause.

Testing:
- Added a runtime profile counter NumRowsSkippedByLateMaterialization
  to record the total number of top-level rows skipped by late
  materialization for all columns.
- Added e2e test cases in parquet-late-materialization.test to ensure
  that late materialization works using the counter added.

Change-Id: Ia21bdfa6811408d66d74367e0a9520e20951105f
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/exec/parquet/parquet-collection-column-reader.cc
M be/src/exec/parquet/parquet-column-readers.cc
M be/src/exec/parquet/parquet-column-readers.h
M be/src/exec/parquet/parquet-complex-column-reader.h
M be/src/exec/parquet/parquet-level-decoder.h
M be/src/exec/parquet/parquet-struct-column-reader.cc
M be/src/exec/scratch-tuple-batch.h
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M testdata/workloads/functional-planner/queries/PlannerTest/zipping-unnest.test
M 
testdata/workloads/functional-query/queries/QueryTest/parquet-late-materialization.test
13 files changed, 121 insertions(+), 33 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/62/22662/5
--
To view, visit http://gerrit.cloudera.org:8080/22662
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia21bdfa6811408d66d74367e0a9520e20951105f
Gerrit-Change-Number: 22662
Gerrit-PatchSet: 5
Gerrit-Owner: Xuebin Su <x...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>

Reply via email to