Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/23012 )

Change subject: IMPALA-9874: Skip IO for late materialized columns
......................................................................


Patch Set 13:

(5 comments)

Thank you for working on this!

http://gerrit.cloudera.org:8080/#/c/23012/13//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/23012/13//COMMIT_MSG@23
PS13, Line 23: As a result, IO bound queries with low selectivity can run
             : significantly faster.
Can queries with high selectivity regress? E.g. because IO scheduling is 
delayed for them?


http://gerrit.cloudera.org:8080/#/c/23012/13//COMMIT_MSG@25
PS13, Line 25:
Could you please share some numbers? Did you run perf AB tests on tpch/tpcds?


http://gerrit.cloudera.org:8080/#/c/23012/13/be/src/exec/orc/hdfs-orc-scanner.cc
File be/src/exec/orc/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/23012/13/be/src/exec/orc/hdfs-orc-scanner.cc@926
PS13, Line 926: rows_selected_in_row_group_ = false;
Does it do anything for ORC?


http://gerrit.cloudera.org:8080/#/c/23012/13/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/23012/13/be/src/exec/parquet/hdfs-parquet-scanner.cc@259
PS13, Line 259:   late_materialization_ = !(
              :       // Late materialization will be disabled if
              :       filter_readers_.empty() // no columns are used for 
filtering, or
              :       || non_filter_readers_.empty() // all columns are used 
for filtering, or
              :       || late_materialization_threshold_
              :           < 0 // the threshold is set to negative in query 
options, or
              :       || filter_readers_[0]->max_rep_level()
              :           > 0 // the first filter column belongs to a repeated 
field, or
              :       || HasStructColumnReader(
              :           non_filter_readers_) // a non-filter reader has 
struct readers.
nit: could be moved to its own method

Btw, thanks for adding the comments, it makes it much easier to understand!


http://gerrit.cloudera.org:8080/#/c/23012/13/be/src/exec/parquet/hdfs-parquet-scanner.cc@1051
PS13, Line 1051:       RETURN_IF_ERROR(
               :           
BaseScalarColumnReader::StartScans(non_dict_filterable_filter_readers_));
               :       status =
               :           
BaseScalarColumnReader::InitDictionaries(non_dict_filterable_filter_readers_);
We can see this pattern 3X in this file, I think it's worth to create a 
function for this.



--
To view, visit http://gerrit.cloudera.org:8080/23012
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4a052b028220517503e634e3f916d1fbd60eb65d
Gerrit-Change-Number: 23012
Gerrit-PatchSet: 13
Gerrit-Owner: Xuebin Su <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Xuebin Su <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Wed, 10 Sep 2025 10:47:03 +0000
Gerrit-HasComments: Yes

Reply via email to