Xuebin Su has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/22662 )
Change subject: IMPALA-3841: Enable late materialization for collections ...................................................................... IMPALA-3841: Enable late materialization for collections This patch enables late materialization for collections to avoid the cost of materializing collections that will never be accessed by the query. For a collection column, late materialization takes effect only when the collection column is not used in any predicate, including the `!empty()` predicate added by the planner. Otherwise we need to read every row to evaluate the predicate and cannot skip any. Therefore, this patch skips registering the `!empty()` predicates if the query contains zipping unnests for late materialization. The late materialization threshold is set to 1 in HdfsParquetScanner when there is any collection that can be skipped. This patch also adds the detail of `HdfsScanner::parse_status_` to the error message returned by the HdfsParquetScanner to help figure out the root cause. Testing: - Added a runtime profile counter NumRowsSkippedByLateMaterialization to record the total number of top-level rows skipped by late materialization for all columns. The counter only counts the rows that are not skipped as a page. - Added e2e test cases in test_parquet_late_materialization.py to ensure that late materialization works using the new counter. Change-Id: Ia21bdfa6811408d66d74367e0a9520e20951105f --- M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.h M be/src/exec/parquet/parquet-collection-column-reader.cc M be/src/exec/parquet/parquet-column-readers.cc M be/src/exec/parquet/parquet-column-readers.h M be/src/exec/parquet/parquet-complex-column-reader.h M be/src/exec/parquet/parquet-level-decoder.h M be/src/exec/parquet/parquet-struct-column-reader.cc M be/src/exec/scratch-tuple-batch.h M common/thrift/generate_error_codes.py M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M testdata/workloads/functional-planner/queries/PlannerTest/zipping-unnest.test M testdata/workloads/functional-query/queries/QueryTest/parquet-late-materialization-unique-db.test M testdata/workloads/functional-query/queries/QueryTest/parquet-late-materialization.test M tests/query_test/test_parquet_late_materialization.py 15 files changed, 178 insertions(+), 33 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/62/22662/7 -- To view, visit http://gerrit.cloudera.org:8080/22662 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia21bdfa6811408d66d74367e0a9520e20951105f Gerrit-Change-Number: 22662 Gerrit-PatchSet: 7 Gerrit-Owner: Xuebin Su <x...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Xuebin Su <x...@cloudera.com>