On Thu, 26 Oct 2023 21:01:53 GMT, Jonathan Joo <j...@openjdk.org> wrote:
>> 8315149: Add hsperf counters for CPU time of internal GC threads > > Jonathan Joo has updated the pull request incrementally with one additional > commit since the last revision: > > Remove StringDedup from GC thread list > Okay, these counters can be accessed frequently, but is it necessary for them > to provide up-to-date information on every access? If not, what level of > delay is acceptable? (In the Parallel case, I believe the counters can be > outdated for the duration of the full-gc.) Besides Jonathan's point, in our experience updating these counters once every 1 second is good enough for AHS. It might even be OK for once every 2-3 seconds. We just don't want the counters to be outdated for tens or hundreds of seconds. Also the "tens or hundreds of seconds" delay is mainly a problem for concurrent mark, but less of a problem for GC pauses like the full-GC, because: - Multi-second GC pauses are uncommon compared to multi-second concurrent mark. - GC pauses are frequent. It is OK for AHS to get slightly outdated info for the ongoing GC pause, because AHS can quickly influence the next GC pause after the ongoing GC pause finishes and updates the counters. This is also the reason we only refresh `sun.threads.total_gc_cpu_time` after a GC pause. > My primary concern is that the change in G1 is too intrusive -- the logic for > tracking/collecting thread-CPU is scattered in many places. Additionally, the > situation is expected to worsen in the near future, based on the statement "I > can create a separate RFE to make it update more frequently..." I think most G1 changes in this PR are straightforward and easy to maintain. With the exception of concurrent mark (`sun.threads.cpu_time.gc_conc_mark`), each counter is only updated at exactly one place. As part of [JDK-8318941](https://bugs.openjdk.org/browse/JDK-8318941), we could find a way to update `sun.threads.cpu_time.gc_conc_mark` at only one place as well. In addition, we think it is well worth the effort for G1 (or any modern garbage collector) to keep track of CPU time spent by their GC threads. Besides monitoring benefits of users and external tools like AHS, it could open up opportunity for G1 to develop better heuristics based on CPU time. We find CPU time spent by GC threads is a better measure for GC overhead, than wall (pause) time. There was [a discussion](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2021-May/035241.html) about `GCTimeRatio` and CPU time. Today even after [JDK-8253413](https://bugs.openjdk.org/browse/JDK-8253413), `GCTimeRatio` still only accounts for pause time. We hope JVM could provide a flag `GCCpuRatio` and resizes its heap to respect `GCCpuRatio`. AHS actually tries to partly achieve the effect of a `GCCpuRatio` flag. > Also, why isn't `G1ServiceThread` part of the change? I would expect all > subclasses of ConcurrentGCThread to be taken into account. Is this omission > intentional? Thanks. We missed this thread in our accounting due to oversight. It was named `G1YoungRemSetSamplingThread` before. @jjoo172, could you add an hsperf counter for `G1ServiceThread`? > Finally, thread-CPU time is something tracked at the OS level, so it's a bit > odd that one has to instrument the VM to get that information. > According to https://man7.org/linux/man-pages/man5/proc.5.html, "(14) utime" > + "(15) stime %lu" == thread-cpu-time. cat `/proc/<java-pid>/task/*/stat` > lists all VM internal threads, including GC, JIT, and etc. It is possible, but it is a lot of work for the users. `/proc/<java-pid>/task/*/stat` lists all Java threads from the application as well. Users would need deep JVM knowledge to find out which threads are GC threads, which are JIT threads, etc. Quite a few other issues come along as well: - What if the JVM's internal threads are renamed across different JDK versions? (E.g. `GC Thread#N` was named `Gang worker#N` in JDK 8.) - Different tools that need to read CPU time would each need to implement a parser for `/proc/<java-pid>/task/*/stat`, and deal with similar problems. - What if a tool needs to support multiple OSes like Windows, which do not have `/proc` FS? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15082#issuecomment-1786267159