Xuebin Su has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22662 )

Change subject: IMPALA-3841: Enable late materialization for collections
......................................................................


Patch Set 5:

(4 comments)

> Patch Set 2:
>
> (4 comments)

Thanks for reviewing!

http://gerrit.cloudera.org:8080/#/c/22662/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/22662/2//COMMIT_MSG@9
PS2, Line 9: enables
> Wouldn't it make sense not just to enable it, but make it more "dense" in c
Thanks! I think it makes sense to set parquet_late_materialization_threshold = 
1 when the collections are large. But when the collections are small, maybe we 
can still use the default 20?

What do you think? Thanks!


http://gerrit.cloudera.org:8080/#/c/22662/2//COMMIT_MSG@13
PS2, Line 13: For a collection column, late materialization takes effect only 
when the
            : collection
> Can you add more info on this? Is my understanding correct that the problem
Thanks! I think that is correct. Added in the commit message.

I think your ideas make sense. Maybe we can investigate more in a follow-up 
JIRA?


http://gerrit.cloudera.org:8080/#/c/22662/2//COMMIT_MSG@23
PS2, Line 23:
> Did you run benchmarks? 2 interesting topics:
Thanks! I did some benchmarks:

- I find that, when the selectivity is high, the scanning time almost does not 
change, while when the selectivity is low, late materialization for collections 
can reduce the scanning time by about 50%. This is true for both MT_DOP=0 and 
MT_DOP=1.
- Perf A/B testing shows no regression in TPC-H.


http://gerrit.cloudera.org:8080/#/c/22662/2/be/src/exec/parquet/parquet-column-readers.cc
File be/src/exec/parquet/parquet-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/22662/2/be/src/exec/parquet/parquet-column-readers.cc@1339
PS2, Line 1339:  - rows_skipped;
> I am concerned  about potential perf regressions as this modifies an atomic
Thanks! Added a member in BaseScalarColumnReader for that. Is that OK?



--
To view, visit http://gerrit.cloudera.org:8080/22662
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia21bdfa6811408d66d74367e0a9520e20951105f
Gerrit-Change-Number: 22662
Gerrit-PatchSet: 5
Gerrit-Owner: Xuebin Su <x...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Xuebin Su <x...@cloudera.com>
Gerrit-Comment-Date: Tue, 22 Apr 2025 10:35:33 +0000
Gerrit-HasComments: Yes

Reply via email to