github-actions[bot] commented on code in PR #60777:
URL: https://github.com/apache/doris/pull/60777#discussion_r2942015245
##########
fe/fe-core/src/main/java/org/apache/doris/metric/SystemMetrics.java:
##########
@@ -54,10 +56,72 @@ public class SystemMetrics {
protected long buffers = 0;
// Memory in the pagecache (Diskcache and Shared Memory)
protected long cached = 0;
+ // CPU time metrics (in USER_HZ, typically 1/100th of a second)
+ // Time spent in user mode
+ protected long cpuUser = 0;
+ // Time spent in user mode with low priority (nice)
+ protected long cpuNice = 0;
+ // Time spent in system/kernel mode
+ protected long cpuSystem = 0;
+ // Time spent idle
+ protected long cpuIdle = 0;
+ // Time waiting for I/O to complete
+ protected long cpuIowait = 0;
+ // Time servicing hardware interrupts
+ protected long cpuIrq = 0;
+ // Time servicing software interrupts
+ protected long cpuSoftirq = 0;
+ // Time stolen by hypervisor (for VMs)
+ protected long cpuSteal = 0;
+ // Time spent running virtual CPU for guest OS
+ protected long cpuGuest = 0;
+ // Time spent running niced guest
+ protected long cpuGuestNice = 0;
+
+ // Package-private: allows tests to override the /proc/stat resource path
+ String cpuStatTestFile = "data/stat_normal";
+
Review Comment:
**[Suggestion]** This package-private mutable field exists solely for test
injection. It leaks test infrastructure into the production class. Consider
using a `protected` method that returns the file path (which tests can
override), or a constructor parameter, to keep test concerns separated from
production code.
##########
fe/fe-core/src/main/java/org/apache/doris/metric/SystemMetrics.java:
##########
@@ -145,4 +209,140 @@ private void updateMemoryMetrics() {
}
}
+ private void updateCpuMetrics() {
+ String procFile = "/proc/stat";
+ if (FeConstants.runningUnitTest) {
+ procFile =
getClass().getClassLoader().getResource(cpuStatTestFile).getFile();
+ }
+
+ try (FileReader fileReader = new FileReader(procFile);
+ BufferedReader br = new BufferedReader(fileReader)) {
+ String line;
+ boolean cpuLineFound = false;
+
+ // Store previous values for delta calculation
+ long prevTotal = prevCpuUser + prevCpuNice + prevCpuSystem +
prevCpuIdle
+ + prevCpuIowait + prevCpuIrq + prevCpuSoftirq +
prevCpuSteal;
+
+ while ((line = br.readLine()) != null) {
+ if (line.startsWith("cpu ")) { // Overall CPU stats (not
per-core)
+ cpuLineFound = true;
+ String[] parts = WHITESPACE.split(line);
Review Comment:
**[Robustness]** When `parts.length < 8` (malformed `cpu` line),
`cpuLineFound` is set to `true` here but none of the if/else-if branches
execute, so no CPU fields are updated. The code then proceeds to the percentage
calculation at line 290 using stale/zero values, which silently produces
incorrect metrics.
Consider adding an `else` clause that either logs a warning and sets
`cpuLineFound = false`, or explicitly handles the malformed case:
```java
} else {
LOG.warn("unexpected /proc/stat cpu line format, parts.length={}: {}",
parts.length, line);
cpuLineFound = false;
}
```
##########
fe/fe-core/src/main/java/org/apache/doris/metric/SystemMetrics.java:
##########
@@ -145,4 +209,140 @@ private void updateMemoryMetrics() {
}
}
+ private void updateCpuMetrics() {
+ String procFile = "/proc/stat";
+ if (FeConstants.runningUnitTest) {
+ procFile =
getClass().getClassLoader().getResource(cpuStatTestFile).getFile();
+ }
+
+ try (FileReader fileReader = new FileReader(procFile);
+ BufferedReader br = new BufferedReader(fileReader)) {
+ String line;
+ boolean cpuLineFound = false;
+
+ // Store previous values for delta calculation
+ long prevTotal = prevCpuUser + prevCpuNice + prevCpuSystem +
prevCpuIdle
+ + prevCpuIowait + prevCpuIrq + prevCpuSoftirq +
prevCpuSteal;
+
+ while ((line = br.readLine()) != null) {
+ if (line.startsWith("cpu ")) { // Overall CPU stats (not
per-core)
+ cpuLineFound = true;
+ String[] parts = WHITESPACE.split(line);
+ if (parts.length >= 11) {
+ // Full format with guest/guest_nice (kernel >= 2.6.24
with guest, >= 2.6.33 with guest_nice)
+ cpuUser = Long.parseLong(parts[1]);
+ cpuNice = Long.parseLong(parts[2]);
+ cpuSystem = Long.parseLong(parts[3]);
+ cpuIdle = Long.parseLong(parts[4]);
+ cpuIowait = Long.parseLong(parts[5]);
+ cpuIrq = Long.parseLong(parts[6]);
+ cpuSoftirq = Long.parseLong(parts[7]);
+ cpuSteal = Long.parseLong(parts[8]);
+ cpuGuest = Long.parseLong(parts[9]);
+ cpuGuestNice = Long.parseLong(parts[10]);
+ } else if (parts.length >= 9) {
+ // Format with steal but without guest/guest_nice
(kernel >= 2.6.11)
+ cpuUser = Long.parseLong(parts[1]);
+ cpuNice = Long.parseLong(parts[2]);
+ cpuSystem = Long.parseLong(parts[3]);
+ cpuIdle = Long.parseLong(parts[4]);
+ cpuIowait = Long.parseLong(parts[5]);
+ cpuIrq = Long.parseLong(parts[6]);
+ cpuSoftirq = Long.parseLong(parts[7]);
+ cpuSteal = Long.parseLong(parts[8]);
+ cpuGuest = 0;
+ cpuGuestNice = 0;
+ } else if (parts.length >= 8) {
+ // Older format without steal (kernel < 2.6.11)
+ cpuUser = Long.parseLong(parts[1]);
+ cpuNice = Long.parseLong(parts[2]);
+ cpuSystem = Long.parseLong(parts[3]);
+ cpuIdle = Long.parseLong(parts[4]);
+ cpuIowait = Long.parseLong(parts[5]);
+ cpuIrq = Long.parseLong(parts[6]);
+ cpuSoftirq = Long.parseLong(parts[7]);
+ cpuSteal = 0;
+ cpuGuest = 0;
+ cpuGuestNice = 0;
+ }
+ } else if (line.startsWith("ctxt ")) {
+ ctxt = parseSingleLongFromLine(line);
+ } else if (line.startsWith("processes ")) {
+ processes = parseSingleLongFromLine(line);
+ } else if (line.startsWith("procs_running ")) {
+ procsRunning = parseSingleLongFromLine(line);
+ } else if (line.startsWith("procs_blocked ")) {
+ procsBlocked = parseSingleLongFromLine(line);
+ }
+ }
+
+ // Validate that CPU line was found
+ if (!cpuLineFound) {
+ LOG.warn("failed to get /proc/stat: cpu line not found");
+ return;
+ }
+
+ // Calculate percentages based on delta (skip on first call to
avoid inflated values)
+ long total = cpuUser + cpuNice + cpuSystem + cpuIdle + cpuIowait
+ + cpuIrq + cpuSoftirq + cpuSteal;
+
+ // Only calculate percentages if this is not the first call
(prevTotal > 0)
+ if (prevTotal > 0) {
+ long delta = total - prevTotal;
+
+ if (delta > 0) {
+ // Calculate usage as percentage of non-idle time
+ long idleDelta = cpuIdle - prevCpuIdle;
+ long userDelta = cpuUser - prevCpuUser;
+ long systemDelta = cpuSystem - prevCpuSystem;
+ long iowaitDelta = cpuIowait - prevCpuIowait;
+ long stealDelta = cpuSteal - prevCpuSteal;
+
+ cpuUsagePercent = 100.0 * (delta - idleDelta) / delta;
+ cpuUserPercent = 100.0 * userDelta / delta;
+ cpuSystemPercent = 100.0 * systemDelta / delta;
+ cpuIowaitPercent = 100.0 * iowaitDelta / delta;
+ cpuStealPercent = 100.0 * stealDelta / delta;
+ }
+ }
+
+ // Store current CPU values as previous for next iteration
+ prevCpuUser = cpuUser;
+ prevCpuNice = cpuNice;
+ prevCpuSystem = cpuSystem;
+ prevCpuIdle = cpuIdle;
+ prevCpuIowait = cpuIowait;
+ prevCpuIrq = cpuIrq;
+ prevCpuSoftirq = cpuSoftirq;
+ prevCpuSteal = cpuSteal;
+
+ // Calculate context switch and process fork rates (skip on first
call)
+ long currentTime = System.currentTimeMillis();
+ long timeDelta = currentTime - lastUpdateTime;
+
+ if (timeDelta > 0) {
+ double timeInSeconds = timeDelta / 1000.0;
+ if (prevCtxt > 0) {
+ ctxtRate = (ctxt - prevCtxt) / timeInSeconds;
+ }
+ if (prevProcesses > 0) {
+ processesRate = (processes - prevProcesses) /
timeInSeconds;
+ }
+ }
+
+ // Store current values as previous for next iteration
+ prevCtxt = ctxt;
+ prevProcesses = processes;
+ lastUpdateTime = currentTime;
+
+ } catch (Exception e) {
+ LOG.warn("failed to get /proc/stat: {}", e.getMessage(), e);
+ }
+ }
+
+ private long parseSingleLongFromLine(String line) {
+ String[] parts = WHITESPACE.split(line);
Review Comment:
**[Robustness]** `Long.parseLong(parts[1])` can throw
`NumberFormatException` if the value is malformed. While the outer `catch
(Exception e)` block will handle it, the exception short-circuits the rest of
`updateCpuMetrics()`, skipping the updates to `prevCtxt`, `prevProcesses`, and
`lastUpdateTime` (lines 325-336). On the next successful invocation, the stale
`prev*` values combined with the new `lastUpdateTime` delta will produce a
spurious spike in `ctxtRate` and `processesRate`.
Consider wrapping the parse in a try-catch or adding validation:
```java
private long parseSingleLongFromLine(String line) {
String[] parts = WHITESPACE.split(line);
if (parts.length >= 2) {
try {
return Long.parseLong(parts[1]);
} catch (NumberFormatException e) {
LOG.warn("failed to parse long from line: {}", line);
}
}
return 0;
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]