Hi Lydia,

I have used sar monitoring (sar -u -n DEV -p -d -r 1) and plotted the average 
over multiple nodes.

1)So for each node you can collect the sar output, and obtain for example:

Linux 3.2.0-4-amd64 (parasilo-4.rennes.grid5000.fr)     2016-01-27      
_x86_64_        (16 CPU)
12:54:09        CPU     %user     %nice   %system   %iowait    %steal     %idle
12:54:10        all      4.63      0.00      3.25      0.13      0.00     91.99
12:54:09    kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit   
%commit  kbactive   kbinact
12:54:10    129538812   2525308      1.91      1292     85876   3662636      
2.69   2111652     55132
12:54:09          DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     
await     svctm     %util
12:54:10          sda     28.71   2708.91     87.13     97.38      0.03      
1.10      0.97      2.77
12:54:09        IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s   
txcmp/s  rxmcst/s
12:54:10         eth0    632.67    587.13   3173.60     58.47      0.00      
0.00      0.00

2) Calculate the average over your nodes (sync clocks) and obtain a final 
output over which you run some plot scripts:

LINE      DATE      FILENAME                 CPU_user  CPU_SYS   KBMEMFREE 
KBMEMUSED MEMUSED   DISK_UTIL DISK_RKBs DISK_WKBs _IO_RSTs  _IO_WSTs
1         12:54:10  res1Avg                  6.12      1.25      129554704 
2509412   1.90      6.00      4253.63   87.04     3944.00   88.00     
2         12:54:11  res1Avg                  3.41      0.28      129523432 
2540690   1.92      4.00      2335.82   51.62     2692.00   0.00      
3         12:54:12  res1Avg                  0.06      0.03      129522000 
2542120   1.92      1.60      0.16      0.59      2048.00   32.00     
4         12:54:13  res1Avg                  0.09      0.06      129520936 
2543182   1.92      0.60      0.19      0.59      2048.00   0.00      
5         12:54:14  res1Avg                  0.06      0.06      129518448 
2545670   1.93      6.80      4.31      169.47    4044.00   16.00     

For other metrics specific to Flink’s execution you may need to rely on various 
metrics Flink is currently exposing.

Best,
Ovidiu

> On 21 Dec 2016, at 19:55, Lydia Ickler <ickle...@googlemail.com> wrote:
> 
> Hi all,
> 
> I have a question regarding the Monitoring REST API;
> 
> I want to analyze the behavior of my program with regards to I/O MiB/s, 
> Network MiB/s and CPU % as the authors of this paper did. 
> (https://hal.inria.fr/hal-01347638v2/document 
> <https://hal.inria.fr/hal-01347638v2/document>)
> From the JSON file at http:master:8081/jobs/jobid/ I get a summary including 
> the information of read/write records and read/write bytes.
> Unfortunately the entries of Network or CPU are either (unknown) or 0.0. I am 
> running my program on a cluster with up to 32 nodes.
> 
> Where can I find the values for e.g. CPU or Network?
> 
> Thanks in advance!
> Lydia
> 

Reply via email to