[Impala-ASF-CR] IMPALA-9846: Enable Aggregated Runtime Profile by Default

Surya Hebbar (Code Review) Mon, 29 Dec 2025 13:31:15 -0800

Surya Hebbar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/23154 )


Change subject: IMPALA-9846: Enable Aggregated Runtime Profile by Default
......................................................................


Patch Set 22:

So, sticking with the previous successful approach, the 'total' statistic has 
been used as it also provides additional analytical value. Consider the 
following examples,

1. Actual I/O & Network Cost (Capacity Planning): 'Mean' tells us if work was 
evenly distributed (skew), but 'Total' tells us the actual pressure put on the 
hardware.

    * Example: In `HDFS_SCAN_NODE (id=1)`, the `BytesRead` Mean is 239.75 MB, 
but the Total is 719.25 MB*.

    * Analysis: If I only see the Mean, I know the nodes are balanced. But I 
need the Total to know that this specific query consumed nearly 1GB of I/O 
bandwidth from the storage layer. The same applies to `TotalBytesSent` in the 
Exchange node (Total 107 MB vs Mean 35 MB).

2. Planner Verification (Cardinality Checks): The Query Planner operates on 
global estimates (Total Rows), not per-instance estimates.

    * Example: In the same Scan Node, `RowsRead` shows a Total of 6.00M vs a 
Mean of 2.00M.

    * Analysis: To verify if table statistics are outdated, we compare the 
Planner's cardinality estimate against the Total actual rows. Using the Mean 
requires the user to manually find the instance count and multiply, which is 
friction-heavy.

3. Capturing "Noise" & Contention: For metrics that track system health (like 
context switches or faults), the 'Mean' often dilutes the severity of the issue.

    * Example: In `Fragment F00`, the `TotalThreadsInvoluntaryContextSwitches` 
has a Total of 720(high contention) vs a Mean of 240.

    * Analysis: The Total gives us the aggregate "scheduling noise" the query 
introduced to the OS across the cluster, which is a better indicator of overall 
system impact than the per-node average.


Additionally, to rectify the confusion arising from 'total' statistic when 
representing some counters, the profile representation has been updated to 
exclude printing the 'total' statistic in these cases. Such as for representing 
time/duration values or other counters that have already gone through some form 
of aggregation or sampling, such as 'HighWaterMark' counters like 
"LargestPartitionPercent" or "PeakMemoryUsage".


--
To view, visit http://gerrit.cloudera.org:8080/23154
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If41d6322361fba82c946efd614cc7d28cb1c36e8
Gerrit-Change-Number: 23154
Gerrit-PatchSet: 22
Gerrit-Owner: Surya Hebbar <[email protected]>
Gerrit-Reviewer: Abhishek Rawat <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Kurt Deschler <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Surya Hebbar <[email protected]>
Gerrit-Comment-Date: Mon, 29 Dec 2025 21:30:39 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9846: Enable Aggregated Runtime Profile by Default

Reply via email to