Xuebin Su has posted comments on this change. ( http://gerrit.cloudera.org:8080/22662 )
Change subject: IMPALA-3841: Enable late materialization for collections ...................................................................... Patch Set 5: (4 comments) > Patch Set 2: > > (4 comments) Thanks for reviewing! http://gerrit.cloudera.org:8080/#/c/22662/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/22662/2//COMMIT_MSG@9 PS2, Line 9: enables > Wouldn't it make sense not just to enable it, but make it more "dense" in c Thanks! I think it makes sense to set parquet_late_materialization_threshold = 1 when the collections are large. But when the collections are small, maybe we can still use the default 20? What do you think? Thanks! http://gerrit.cloudera.org:8080/#/c/22662/2//COMMIT_MSG@13 PS2, Line 13: For a collection column, late materialization takes effect only when the : collection > Can you add more info on this? Is my understanding correct that the problem Thanks! I think that is correct. Added in the commit message. I think your ideas make sense. Maybe we can investigate more in a follow-up JIRA? http://gerrit.cloudera.org:8080/#/c/22662/2//COMMIT_MSG@23 PS2, Line 23: > Did you run benchmarks? 2 interesting topics: Thanks! I did some benchmarks: - I find that, when the selectivity is high, the scanning time almost does not change, while when the selectivity is low, late materialization for collections can reduce the scanning time by about 50%. This is true for both MT_DOP=0 and MT_DOP=1. - Perf A/B testing shows no regression in TPC-H. http://gerrit.cloudera.org:8080/#/c/22662/2/be/src/exec/parquet/parquet-column-readers.cc File be/src/exec/parquet/parquet-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/22662/2/be/src/exec/parquet/parquet-column-readers.cc@1339 PS2, Line 1339: - rows_skipped; > I am concerned about potential perf regressions as this modifies an atomic Thanks! Added a member in BaseScalarColumnReader for that. Is that OK? -- To view, visit http://gerrit.cloudera.org:8080/22662 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia21bdfa6811408d66d74367e0a9520e20951105f Gerrit-Change-Number: 22662 Gerrit-PatchSet: 5 Gerrit-Owner: Xuebin Su <x...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Xuebin Su <x...@cloudera.com> Gerrit-Comment-Date: Tue, 22 Apr 2025 10:35:33 +0000 Gerrit-HasComments: Yes