Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/22062 )
Change subject: IMPALA-13487: Add profile counters for memory allocation in parquet scanners ...................................................................... Patch Set 22: (3 comments) http://gerrit.cloudera.org:8080/#/c/22062/21/be/src/exec/hdfs-scanner.h File be/src/exec/hdfs-scanner.h: http://gerrit.cloudera.org:8080/#/c/22062/21/be/src/exec/hdfs-scanner.h@170 PS21, Line 170: > It's a little weird this is a public interface, I would have made it protec Yeah, it seems an internal method that shouldn't be exposed. Moved it to the protected section. http://gerrit.cloudera.org:8080/#/c/22062/21/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/22062/21/be/src/exec/parquet/hdfs-parquet-scanner.cc@75 PS21, Line 75: "Not part of DecompressionTime."); > Can we say where they are tracked? Do they show up in ScannerIoWaitTime or It's hard to say. data_page_pool_->FreeAll() is used in two places: * ParquetColumnChunkReader::ReleaseResourcesOfLastPage(). Duration of this is part of MaterializeTupleTime. * ParquetColumnChunkReader::Close(). This is used in BaseScalarColumnReader::Close() which is used in multiple places. When it's used in finishing a RowGroup to start reading a new RowGroup, e.g. in FlushRowGroupResources(), the duration is tracked in MaterializeTupleTime. When it's used in HdfsParquetScanner::Close(), it's not part of MaterializeTupleTime. They are not part of ScannerIoWaitTime. But I'm not sure whether they are part of ScannerThreadsSysTime. It depends on whether free() will go into the kernal space. I think mostly we just release the memory to TCMalloc. But TCMalloc might return some memory to the OS in some cases. http://gerrit.cloudera.org:8080/#/c/22062/21/be/src/exec/scratch-tuple-batch.h File be/src/exec/scratch-tuple-batch.h: http://gerrit.cloudera.org:8080/#/c/22062/21/be/src/exec/scratch-tuple-batch.h@221 PS21, Line 221: Tuple* GetTuple(int idx) const { > Any reason you changed the name? Seems like an arbitrary change. 'tuple_idx' is a field name so try to avoid the conflict. https://github.com/apache/impala/blob/6ddd69c605d4c594e33fdd39a2ca888538b4b8d7/be/src/exec/scratch-tuple-batch.h#L53 -- To view, visit http://gerrit.cloudera.org:8080/22062 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I982315d96e6de20a3616f3bd2a2b4866d1ff4710 Gerrit-Change-Number: 22062 Gerrit-PatchSet: 22 Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com> Gerrit-Reviewer: Michael Smith <michael.sm...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com> Gerrit-Reviewer: Xuebin Su <x...@cloudera.com> Gerrit-Comment-Date: Tue, 15 Apr 2025 03:03:40 +0000 Gerrit-HasComments: Yes