Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22662 )

Change subject: IMPALA-3841: Enable late materialization for collections
......................................................................


Patch Set 5:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/22662/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/22662/2//COMMIT_MSG@9
PS2, Line 9: enables
> Thanks! I think it makes sense to set parquet_late_materialization_threshol
I lean towards setting it to 1 if there is any collection column whose 
materializion can be skipped. This could also make sense for other "expensive 
to materialize types", e.g. timestamps if there is timezone conversion, but I 
would not touch that for now.

Using the treshold of 1 would also help with testing, it wouldn't be needed to 
cover the collection case with multiple parquet_late_materialization_threshold  
values.


http://gerrit.cloudera.org:8080/#/c/22662/2//COMMIT_MSG@13
PS2, Line 13: For a collection column, late materialization takes effect only 
when the
            : collection
> Thanks! I think that is correct. Added in the commit message.
Agree, no need to do it here.


http://gerrit.cloudera.org:8080/#/c/22662/5/be/src/exec/parquet/parquet-column-readers.cc
File be/src/exec/parquet/parquet-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/22662/5/be/src/exec/parquet/parquet-column-readers.cc@1530
PS5, Line 1530:   if (num_rows_skipped_by_late_materialization_counter_ > 0) {
Shouldn't num_rows_skipped_by_late_materialization_counter_ be nulled here? 
Also, this info will be lost if the scanning doesn't end with hitting the end 
of the column chunk - for example if the query is cancelled. Updating the 
counter in Close() could help with this.



--
To view, visit http://gerrit.cloudera.org:8080/22662
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia21bdfa6811408d66d74367e0a9520e20951105f
Gerrit-Change-Number: 22662
Gerrit-PatchSet: 5
Gerrit-Owner: Xuebin Su <x...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Xuebin Su <x...@cloudera.com>
Gerrit-Comment-Date: Thu, 24 Apr 2025 08:37:37 +0000
Gerrit-HasComments: Yes

Reply via email to