Hi Guido, sorry for the late reply. You were collecting the stats every 1 second. Afaik, Flink is internally collecting the stats with a frequency of 5 seconds, so you can either change your or Flink's polling interval (I think its taskmanager.heartbeat-interval)
Regarding the details on PS-Scavenge, MarkSweep etc.: We just use the names the Java management beans return, so you can just google for the names and read how to interpret them. For example: http://www.ibm.com/developerworks/library/j-jtp11253/ The load is the operating system load. On Thu, Feb 4, 2016 at 10:25 PM, Guido <gmazza...@gmail.com> wrote: > Hello, > > I have few questions regarding garbage collector’s stats on Taskmanagers > and any help or further documentation would be great. > I have collected “1 second polling requesting" stats on 7 Taskmanagers, > through the relative request (/taskmanagers/<idtaskmanager>/) of the > Monitoring REST API while a job, that overall took 38 seconds, was > running. > > This way got 38 records for each TaskManager and focusing on garbage > collector’s stats I can see, for example on 1 of the 38th records: > > - PS-Scavenge.Time: 2597, PS-MarkSweep.Time: 29016; > 1. Is It correct to assume they represent the total elapsed time on > different GCs (respectively young and old gen)? So, I basically got a > running sum distribution? > 2. If yes, values are in mills, so 29 sec? > > 3. Could they be used to get how much time has been wasted in total > because of the “Stop-the-world” GCs policy? > > Finally, on the same record: > > - PS-Scavenge.Count: 3, PS-MarkSweep.Time: 5, load: 3.73. > > 4. Is it the “load” value tightly related? > > Sorry if it has been quite long and thanks a lot. > > Guido > > >