Hi Devs, I would like to start a discussion on FLIP-361: Improve GC Metrics [1].
The current Flink GC metrics [2] are not very useful for monitoring purposes as they require post processing logic that is also dependent on the current runtime environment. Problems: - Total time is not very relevant for long running applications, only the rate of change (msPerSec) - In most cases it's best to simply aggregate the time/count across the different GabrageCollectors, however the specific collectors are dependent on the current Java runtime We propose to improve the current situation by: - Exposing rate metrics per GarbageCollector - Exposing aggregated Total time/count/rate metrics These new metrics are all derived from the existing ones with minimal overhead. Looking forward to your feedback. Cheers, Gyula [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-361%3A+Improve+GC+Metrics [2] https://nightlies.apache.org/flink/flink-docs-master/docs/ops/metrics/#garbagecollection
