Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22062 )

Change subject: IMPALA-13487: Add profile counters for memory allocation
......................................................................


Patch Set 19:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/22062/18//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/22062/18//COMMIT_MSG@17
PS18, Line 17:     - ScratchBatchMemAllocDuration
             :     - ScratchBatchMemFreeDuration
             :     - ScratchBatchMemAllocBytes
> Why there is no ScratchBatchMemFreeBytes when there is 
> ParquetDataPagePoolFreeBytes?

free() is actually not used in aux_mem_pool and tuple_mem_pool of the scratch 
batch since their chunks are all attached to the output RowBatches. Related 
codes are in ScratchTupleBatch::FinalizeTupleTransfer() and 
ScratchTupleBatch::ReleaseResources(). So when I added 
ScratchBatchMemFreeBytes, the values are always 0. I think 
ScratchBatchMemFreeDuration is enough to indicate free() is not used. But I can 
add ScratchBatchMemFreeBytes back for completeness.

> Consider using nested counters for counters derived from MemPoolCounters, 
> just like "Buffer pool". Something like this maybe:

Agree that it'd be more human-readable. But shorter names like AllocBytes make 
it hard to search them by grep, e.g. we need to make sure it's under "MemPool 
of ScratchBatch". So we'd better keep the full names like we does for 
TotalThreadsSysTime and TotalThreadsUserTime. Probably not worth it for 3-4 
counters since they are already lined up together.


http://gerrit.cloudera.org:8080/#/c/22062/18/be/src/exec/hdfs-columnar-scanner.cc
File be/src/exec/hdfs-columnar-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/22062/18/be/src/exec/hdfs-columnar-scanner.cc@71
PS18, Line 71: PROFILE_DEFINE_SUMMARY_STATS_TIMER(ScratchBatchMemAllocDuration, 
DEBUG,
             :     "Stats of time spent in malloc() used by MemPools of the 
scratch batch. These are "
             :     "part of MaterializeTupleTime. If the sum accounts for a 
significant portion of it, "
             :     "tuple materialization is dominated by memory allocation 
time.");
             : PROFILE_DEFINE_SUMMARY_STATS_TIMER(ScratchBatchMemFreeDuration, 
DEBUG,
             :     "Stats of time spent in free() used by MemPools of the 
scratch batch. Same as "
             :     "ScratchBatchMemAllocDuration, these are part of 
MaterializeTupleTime.");
             : PROFILE_DEFINE_SUMMARY_STATS_COUNTER(ScratchBatchMemAllocBytes, 
DEBUG, TUnit::BYTES,
             :     "Stats of bytes alloc
> nit: In your commit message, you write a very good instruction on how to co
Thanks! This is a good point. Even when I reanalyze the relationship of these 
counters I found I misunderstood something - ParquetDataPagePoolFreeDuration is 
not part of DecompressionTime.


http://gerrit.cloudera.org:8080/#/c/22062/18/be/src/exec/hdfs-columnar-scanner.cc@338
PS18, Line 338: }
              :
> Why only aggregate the counters at CloseInternal?
SummaryStatsCounter::SetStats() and SummaryStatsCounter::Merge() requires 
holding the lock of the counter. So only use them at the end, i.e. 
CloseInternal(), to reduce the contention between scanner threads. But I 
haven't measured the performance impact yet.


http://gerrit.cloudera.org:8080/#/c/22062/18/be/src/exec/hdfs-columnar-scanner.cc@344
PS18, Line 344: void HdfsColumnarScanner::CloseInternal() {
              :   // Only aggregate the counters at the end to reduce the 
contention between scanner
              :   // threads.
              :   MemPoolCounters counters = scratch_batch_->tuple_mem_pool.
> nit: Please add whitespace before 'counter' reassignment, or put it in diff
Done



--
To view, visit http://gerrit.cloudera.org:8080/22062
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I982315d96e6de20a3616f3bd2a2b4866d1ff4710
Gerrit-Change-Number: 22062
Gerrit-PatchSet: 19
Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com>
Gerrit-Reviewer: Xuebin Su <x...@cloudera.com>
Gerrit-Comment-Date: Thu, 10 Apr 2025 13:04:19 +0000
Gerrit-HasComments: Yes

Reply via email to