Xuebin Su has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/22662 )
Change subject: IMPALA-3841: Enable late materialization for collections ...................................................................... IMPALA-3841: Enable late materialization for collections This patch enables late materialization for collections to avoid the cost of materializing collections that will never be accessed by the query. For a collection column, late materialization takes effect only when the collection column is not used in any predicate, including the `!empty()` predicate added by the planner. Otherwise we need to read every row to evaluate the predicate and cannot skip any. Therefore, this patch skips registering the `!empty()` predicates if the query contains zipping unnests for late materialization. This patch also adds the detail of `HdfsScanner::parse_status_` to the error message returned by the HdfsParquetScanner to help figure out the root cause. Testing: - Added a runtime profile counter NumRowsSkippedByLateMaterialization to record the total number of top-level rows skipped by late materialization for all columns. - Added e2e test cases in parquet-late-materialization.test to ensure that late materialization works using the counter added. Change-Id: Ia21bdfa6811408d66d74367e0a9520e20951105f --- M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.h M be/src/exec/parquet/parquet-collection-column-reader.cc M be/src/exec/parquet/parquet-column-readers.cc M be/src/exec/parquet/parquet-column-readers.h M be/src/exec/parquet/parquet-complex-column-reader.h M be/src/exec/parquet/parquet-level-decoder.h M be/src/exec/parquet/parquet-struct-column-reader.cc M be/src/exec/scratch-tuple-batch.h M common/thrift/generate_error_codes.py M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M testdata/workloads/functional-planner/queries/PlannerTest/zipping-unnest.test M testdata/workloads/functional-query/queries/QueryTest/parquet-late-materialization.test 13 files changed, 121 insertions(+), 33 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/62/22662/5 -- To view, visit http://gerrit.cloudera.org:8080/22662 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia21bdfa6811408d66d74367e0a9520e20951105f Gerrit-Change-Number: 22662 Gerrit-PatchSet: 5 Gerrit-Owner: Xuebin Su <x...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>