Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/22062 )
Change subject: IMPALA-13487: Add profile counters for memory allocation ...................................................................... Patch Set 19: (4 comments) http://gerrit.cloudera.org:8080/#/c/22062/18//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/22062/18//COMMIT_MSG@17 PS18, Line 17: - ScratchBatchMemAllocDuration : - ScratchBatchMemFreeDuration : - ScratchBatchMemAllocBytes > Why there is no ScratchBatchMemFreeBytes when there is > ParquetDataPagePoolFreeBytes? free() is actually not used in aux_mem_pool and tuple_mem_pool of the scratch batch since their chunks are all attached to the output RowBatches. Related codes are in ScratchTupleBatch::FinalizeTupleTransfer() and ScratchTupleBatch::ReleaseResources(). So when I added ScratchBatchMemFreeBytes, the values are always 0. I think ScratchBatchMemFreeDuration is enough to indicate free() is not used. But I can add ScratchBatchMemFreeBytes back for completeness. > Consider using nested counters for counters derived from MemPoolCounters, > just like "Buffer pool". Something like this maybe: Agree that it'd be more human-readable. But shorter names like AllocBytes make it hard to search them by grep, e.g. we need to make sure it's under "MemPool of ScratchBatch". So we'd better keep the full names like we does for TotalThreadsSysTime and TotalThreadsUserTime. Probably not worth it for 3-4 counters since they are already lined up together. http://gerrit.cloudera.org:8080/#/c/22062/18/be/src/exec/hdfs-columnar-scanner.cc File be/src/exec/hdfs-columnar-scanner.cc: http://gerrit.cloudera.org:8080/#/c/22062/18/be/src/exec/hdfs-columnar-scanner.cc@71 PS18, Line 71: PROFILE_DEFINE_SUMMARY_STATS_TIMER(ScratchBatchMemAllocDuration, DEBUG, : "Stats of time spent in malloc() used by MemPools of the scratch batch. These are " : "part of MaterializeTupleTime. If the sum accounts for a significant portion of it, " : "tuple materialization is dominated by memory allocation time."); : PROFILE_DEFINE_SUMMARY_STATS_TIMER(ScratchBatchMemFreeDuration, DEBUG, : "Stats of time spent in free() used by MemPools of the scratch batch. Same as " : "ScratchBatchMemAllocDuration, these are part of MaterializeTupleTime."); : PROFILE_DEFINE_SUMMARY_STATS_COUNTER(ScratchBatchMemAllocBytes, DEBUG, TUnit::BYTES, : "Stats of bytes alloc > nit: In your commit message, you write a very good instruction on how to co Thanks! This is a good point. Even when I reanalyze the relationship of these counters I found I misunderstood something - ParquetDataPagePoolFreeDuration is not part of DecompressionTime. http://gerrit.cloudera.org:8080/#/c/22062/18/be/src/exec/hdfs-columnar-scanner.cc@338 PS18, Line 338: } : > Why only aggregate the counters at CloseInternal? SummaryStatsCounter::SetStats() and SummaryStatsCounter::Merge() requires holding the lock of the counter. So only use them at the end, i.e. CloseInternal(), to reduce the contention between scanner threads. But I haven't measured the performance impact yet. http://gerrit.cloudera.org:8080/#/c/22062/18/be/src/exec/hdfs-columnar-scanner.cc@344 PS18, Line 344: void HdfsColumnarScanner::CloseInternal() { : // Only aggregate the counters at the end to reduce the contention between scanner : // threads. : MemPoolCounters counters = scratch_batch_->tuple_mem_pool. > nit: Please add whitespace before 'counter' reassignment, or put it in diff Done -- To view, visit http://gerrit.cloudera.org:8080/22062 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I982315d96e6de20a3616f3bd2a2b4866d1ff4710 Gerrit-Change-Number: 22062 Gerrit-PatchSet: 19 Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com> Gerrit-Reviewer: Xuebin Su <x...@cloudera.com> Gerrit-Comment-Date: Thu, 10 Apr 2025 13:04:19 +0000 Gerrit-HasComments: Yes