Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/23012 )
Change subject: IMPALA-9874: Skip IO for late materialized columns ...................................................................... Patch Set 13: (5 comments) Thank you for working on this! http://gerrit.cloudera.org:8080/#/c/23012/13//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/23012/13//COMMIT_MSG@23 PS13, Line 23: As a result, IO bound queries with low selectivity can run : significantly faster. Can queries with high selectivity regress? E.g. because IO scheduling is delayed for them? http://gerrit.cloudera.org:8080/#/c/23012/13//COMMIT_MSG@25 PS13, Line 25: Could you please share some numbers? Did you run perf AB tests on tpch/tpcds? http://gerrit.cloudera.org:8080/#/c/23012/13/be/src/exec/orc/hdfs-orc-scanner.cc File be/src/exec/orc/hdfs-orc-scanner.cc: http://gerrit.cloudera.org:8080/#/c/23012/13/be/src/exec/orc/hdfs-orc-scanner.cc@926 PS13, Line 926: rows_selected_in_row_group_ = false; Does it do anything for ORC? http://gerrit.cloudera.org:8080/#/c/23012/13/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/23012/13/be/src/exec/parquet/hdfs-parquet-scanner.cc@259 PS13, Line 259: late_materialization_ = !( : // Late materialization will be disabled if : filter_readers_.empty() // no columns are used for filtering, or : || non_filter_readers_.empty() // all columns are used for filtering, or : || late_materialization_threshold_ : < 0 // the threshold is set to negative in query options, or : || filter_readers_[0]->max_rep_level() : > 0 // the first filter column belongs to a repeated field, or : || HasStructColumnReader( : non_filter_readers_) // a non-filter reader has struct readers. nit: could be moved to its own method Btw, thanks for adding the comments, it makes it much easier to understand! http://gerrit.cloudera.org:8080/#/c/23012/13/be/src/exec/parquet/hdfs-parquet-scanner.cc@1051 PS13, Line 1051: RETURN_IF_ERROR( : BaseScalarColumnReader::StartScans(non_dict_filterable_filter_readers_)); : status = : BaseScalarColumnReader::InitDictionaries(non_dict_filterable_filter_readers_); We can see this pattern 3X in this file, I think it's worth to create a function for this. -- To view, visit http://gerrit.cloudera.org:8080/23012 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4a052b028220517503e634e3f916d1fbd60eb65d Gerrit-Change-Number: 23012 Gerrit-PatchSet: 13 Gerrit-Owner: Xuebin Su <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Xuebin Su <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Wed, 10 Sep 2025 10:47:03 +0000 Gerrit-HasComments: Yes
