Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22062 )

Change subject: IMPALA-13487: Add profile counters for memory allocation in 
parquet scanners
......................................................................


Patch Set 22:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/22062/21/be/src/exec/hdfs-scanner.h
File be/src/exec/hdfs-scanner.h:

http://gerrit.cloudera.org:8080/#/c/22062/21/be/src/exec/hdfs-scanner.h@170
PS21, Line 170:
> It's a little weird this is a public interface, I would have made it protec
Yeah, it seems an internal method that shouldn't be exposed. Moved it to the 
protected section.


http://gerrit.cloudera.org:8080/#/c/22062/21/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/22062/21/be/src/exec/parquet/hdfs-parquet-scanner.cc@75
PS21, Line 75:     "Not part of DecompressionTime.");
> Can we say where they are tracked? Do they show up in ScannerIoWaitTime or
It's hard to say. data_page_pool_->FreeAll() is used in two places:
* ParquetColumnChunkReader::ReleaseResourcesOfLastPage(). Duration of this is 
part of MaterializeTupleTime.
* ParquetColumnChunkReader::Close(). This is used in 
BaseScalarColumnReader::Close() which is used in multiple places. When it's 
used in finishing a RowGroup to start reading a new RowGroup, e.g. in 
FlushRowGroupResources(), the duration is tracked in MaterializeTupleTime. When 
it's used in HdfsParquetScanner::Close(), it's not part of MaterializeTupleTime.

They are not part of ScannerIoWaitTime. But I'm not sure whether they are part 
of ScannerThreadsSysTime. It depends on whether free() will go into the kernal 
space. I think mostly we just release the memory to TCMalloc. But TCMalloc 
might return some memory to the OS in some cases.


http://gerrit.cloudera.org:8080/#/c/22062/21/be/src/exec/scratch-tuple-batch.h
File be/src/exec/scratch-tuple-batch.h:

http://gerrit.cloudera.org:8080/#/c/22062/21/be/src/exec/scratch-tuple-batch.h@221
PS21, Line 221:   Tuple* GetTuple(int idx) const {
> Any reason you changed the name? Seems like an arbitrary change.
'tuple_idx' is a field name so try to avoid the conflict.
https://github.com/apache/impala/blob/6ddd69c605d4c594e33fdd39a2ca888538b4b8d7/be/src/exec/scratch-tuple-batch.h#L53



--
To view, visit http://gerrit.cloudera.org:8080/22062
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I982315d96e6de20a3616f3bd2a2b4866d1ff4710
Gerrit-Change-Number: 22062
Gerrit-PatchSet: 22
Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com>
Gerrit-Reviewer: Michael Smith <michael.sm...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com>
Gerrit-Reviewer: Xuebin Su <x...@cloudera.com>
Gerrit-Comment-Date: Tue, 15 Apr 2025 03:03:40 +0000
Gerrit-HasComments: Yes

Reply via email to