Surya Hebbar has posted comments on this change. (
http://gerrit.cloudera.org:8080/23154 )
Change subject: IMPALA-9846: Enable Aggregated Runtime Profile by Default
......................................................................
Patch Set 22:
So, sticking with the previous successful approach, the 'total' statistic has
been used as it also provides additional analytical value. Consider the
following examples,
1. Actual I/O & Network Cost (Capacity Planning): 'Mean' tells us if work was
evenly distributed (skew), but 'Total' tells us the actual pressure put on the
hardware.
* Example: In `HDFS_SCAN_NODE (id=1)`, the `BytesRead` Mean is 239.75 MB,
but the Total is 719.25 MB*.
* Analysis: If I only see the Mean, I know the nodes are balanced. But I
need the Total to know that this specific query consumed nearly 1GB of I/O
bandwidth from the storage layer. The same applies to `TotalBytesSent` in the
Exchange node (Total 107 MB vs Mean 35 MB).
2. Planner Verification (Cardinality Checks): The Query Planner operates on
global estimates (Total Rows), not per-instance estimates.
* Example: In the same Scan Node, `RowsRead` shows a Total of 6.00M vs a
Mean of 2.00M.
* Analysis: To verify if table statistics are outdated, we compare the
Planner's cardinality estimate against the Total actual rows. Using the Mean
requires the user to manually find the instance count and multiply, which is
friction-heavy.
3. Capturing "Noise" & Contention: For metrics that track system health (like
context switches or faults), the 'Mean' often dilutes the severity of the issue.
* Example: In `Fragment F00`, the `TotalThreadsInvoluntaryContextSwitches`
has a Total of 720(high contention) vs a Mean of 240.
* Analysis: The Total gives us the aggregate "scheduling noise" the query
introduced to the OS across the cluster, which is a better indicator of overall
system impact than the per-node average.
Additionally, to rectify the confusion arising from 'total' statistic when
representing some counters, the profile representation has been updated to
exclude printing the 'total' statistic in these cases. Such as for representing
time/duration values or other counters that have already gone through some form
of aggregation or sampling, such as 'HighWaterMark' counters like
"LargestPartitionPercent" or "PeakMemoryUsage".
--
To view, visit http://gerrit.cloudera.org:8080/23154
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If41d6322361fba82c946efd614cc7d28cb1c36e8
Gerrit-Change-Number: 23154
Gerrit-PatchSet: 22
Gerrit-Owner: Surya Hebbar <[email protected]>
Gerrit-Reviewer: Abhishek Rawat <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Kurt Deschler <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Surya Hebbar <[email protected]>
Gerrit-Comment-Date: Mon, 29 Dec 2025 21:30:39 +0000
Gerrit-HasComments: No