Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/22662 )
Change subject: IMPALA-3841: Enable late materialization for collections ...................................................................... Patch Set 5: (3 comments) http://gerrit.cloudera.org:8080/#/c/22662/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/22662/2//COMMIT_MSG@9 PS2, Line 9: enables > Thanks! I think it makes sense to set parquet_late_materialization_threshol I lean towards setting it to 1 if there is any collection column whose materializion can be skipped. This could also make sense for other "expensive to materialize types", e.g. timestamps if there is timezone conversion, but I would not touch that for now. Using the treshold of 1 would also help with testing, it wouldn't be needed to cover the collection case with multiple parquet_late_materialization_threshold values. http://gerrit.cloudera.org:8080/#/c/22662/2//COMMIT_MSG@13 PS2, Line 13: For a collection column, late materialization takes effect only when the : collection > Thanks! I think that is correct. Added in the commit message. Agree, no need to do it here. http://gerrit.cloudera.org:8080/#/c/22662/5/be/src/exec/parquet/parquet-column-readers.cc File be/src/exec/parquet/parquet-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/22662/5/be/src/exec/parquet/parquet-column-readers.cc@1530 PS5, Line 1530: if (num_rows_skipped_by_late_materialization_counter_ > 0) { Shouldn't num_rows_skipped_by_late_materialization_counter_ be nulled here? Also, this info will be lost if the scanning doesn't end with hitting the end of the column chunk - for example if the query is cancelled. Updating the counter in Close() could help with this. -- To view, visit http://gerrit.cloudera.org:8080/22662 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia21bdfa6811408d66d74367e0a9520e20951105f Gerrit-Change-Number: 22662 Gerrit-PatchSet: 5 Gerrit-Owner: Xuebin Su <x...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Xuebin Su <x...@cloudera.com> Gerrit-Comment-Date: Thu, 24 Apr 2025 08:37:37 +0000 Gerrit-HasComments: Yes