[DISCUSS] Top & Sampling Rates (kvm.resource.LibvirtComputingResource)

Marty Sweet Wed, 26 Feb 2014 23:59:16 -0800

Hi Guys,

Does anyone have any ideas about this?
My main concern is the KVM resource collector and I assume the other
hypervisor setups are receiving the wrong values.


The CSAgent periodically runs the following command:
 [kvm.resource.LibvirtComputingResource]
(agentRequest-Handler-2:null) Executing: /bin/bash -c idle=$(top -b -n
1|grep Cpu\(s\):|cut -d% -f4|cut -d, -f2);echo $idle

===
When running top manually I get the same results (2 secs after each command):
======
root@aurora:/var/log/cloudstack/agent# top -b -n1 | grep Cpu
Cpu(s):  6.3%us,  1.0%sy,  0.0%ni, 92.4%id,  0.2%wa,  0.0%hi,  0.0%si,  0.0%st
root@aurora:/var/log/cloudstack/agent# top -b -n1 | grep Cpu
Cpu(s):  6.3%us,  1.0%sy,  0.0%ni, 92.4%id,  0.2%wa,  0.0%hi,  0.0%si,  0.0%st
root@aurora:/var/log/cloudstack/agent# top -b -n1 | grep Cpu
Cpu(s):  6.3%us,  1.0%sy,  0.0%ni, 92.4%id,  0.2%wa,  0.0%hi,  0.0%si,  0.0%st
root@aurora:/var/log/cloudstack/agent# top -b -n2 | grep Cpu
Cpu(s):  6.3%us,  1.0%sy,  0.0%ni, 92.4%id,  0.2%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu(s): 29.2%us,  1.1%sy,  0.0%ni, 69.1%id,  0.5%wa,  0.0%hi,  0.1%si,  0.0%st
=======
Apparently this is because:
"This is because top, vmstat, iostat all in their first run collect
data since the last reboot time of the system.
And the successive iterations run on the sampling period that you
specify. So, in the first run of top, you will see the %idle time
because from the time of reboot to the time of running top, it was
that much % idle. But in next iterations, since it is busy it doesn't
show any %idle.
Exclude the first iteration and try sampling over the interval you want."
http://serverfault.com/questions/436446/top-showing-64-idle-on-first-screen-or-batch-run-while-there-is-no-idle-time-a
========

Wouldn't this result in Cloudstack-Agent getting the wrong idle value
for the system?

This hasn't been fixed in 4.3.0, so I will create a patch along the
following lines (if others agree):
/bin/bash -c idle=$(top -d0.10 -b -n 2|grep Cpu\(s\):|tail -n1|cut -d%
-f4|cut -d, -f2;echo $idle
-> Where top -d0.10, changes the refresh interval so the command is
faster to complete.
-> tail -n1, get's the last line of the output (the latest idle value)
===


Let me know what you think,
Regards,
Marty


---------- Forwarded message ----------
From: Marty Sweet <msweet....@gmail.com>
Date: Sun, Feb 23, 2014 at 1:20 PM
Subject: Segfault: Top & Sampling Rates (kvm.resource.LibvirtComputingResource)
To: "dev@cloudstack.apache.org" <dev@cloudstack.apache.org>
Cc: "us...@cloudstack.apache.org" <us...@cloudstack.apache.org>


Hi,

I have just noticed the occasional following error messages in kern.log.
This is happening on all but 1 of my nodes. Is anyone else
experiencing this issue?
=====
Feb 23 06:53:24 aurora kernel: [10185338.400091] top[27631]: segfault
at 0 ip 00007f025eba3315 sp 00007fff3f9ed308 error 4 in
libc-2.15.so[7f025ea6f000+1b5000]
=====

I happened to have one of the nodes in trace mode, showing
cloudstack-agent is starting it:

/var/log/cloudstack/agent/agent.log
======
2014-02-23 06:53:23,654 DEBUG [kvm.resource.LibvirtComputingResource]
(agentRequest-Handler-1:null) Executing: /bin/bash -c idle=$(top -b -n
1|grep Cpu\(s\):|cut -d% -f4|cut -d, -f2);echo $idle
2014-02-23 06:53:23,661 DEBUG [kvm.resource.LibvirtComputingResource]
(agentRequest-Handler-2:null) Executing: /bin/bash -c idle=$(top -b -n
1|grep Cpu\(s\):|cut -d% -f4|cut -d, -f2);echo $idle
======

## This lead me on to find the following (potential) bug:

When running this manually I get the same result (2 secs before each command):
======
root@aurora:/var/log/cloudstack/agent# top -b -n1 | grep Cpu
Cpu(s):  6.3%us,  1.0%sy,  0.0%ni, 92.4%id,  0.2%wa,  0.0%hi,  0.0%si,  0.0%st
root@aurora:/var/log/cloudstack/agent# top -b -n1 | grep Cpu
Cpu(s):  6.3%us,  1.0%sy,  0.0%ni, 92.4%id,  0.2%wa,  0.0%hi,  0.0%si,  0.0%st
root@aurora:/var/log/cloudstack/agent# top -b -n1 | grep Cpu
Cpu(s):  6.3%us,  1.0%sy,  0.0%ni, 92.4%id,  0.2%wa,  0.0%hi,  0.0%si,  0.0%st
root@aurora:/var/log/cloudstack/agent# top -b -n2 | grep Cpu
Cpu(s):  6.3%us,  1.0%sy,  0.0%ni, 92.4%id,  0.2%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu(s): 29.2%us,  1.1%sy,  0.0%ni, 69.1%id,  0.5%wa,  0.0%hi,  0.1%si,  0.0%st
=======
Apparently this is because:
"This is because top, vmstat, iostat all in their first run collect
data since the last reboot time of the system.
And the successive iterations run on the sampling period that you
specify. So, in the first run of top, you will see the %idle time
because from the time of reboot to the time of running top, it was
that much % idle. But in next iterations, since it is busy it doesn't
show any %idle.
Exclude the first iteration and try sampling over the interval you want."
http://serverfault.com/questions/436446/top-showing-64-idle-on-first-screen-or-batch-run-while-there-is-no-idle-time-a
========

Wouldn't this result in Cloudstack-Agent getting the wrong idle value
for the system?

If this hasn't been fixed in 4.3.0, I will create a patch along the
following lines (if others agree):
/bin/bash -c idle=$(top -d0.01 -b -n 2|grep Cpu\(s\):|tail -n1|cut -d%
-f4|cut -d, -f2;echo $idle
-> Where top -d0.01, changes the refresh interval so the command is
faster to complete.
-> tail -n1, get's the last line of the output (the latest idle value)

Ubuntu 12.04 / KVM / CS 4.2.0
Linux aurora 3.5.0-34-generic #55~precise1-Ubuntu SMP Fri Jun 7
16:25:50 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

Thanks,
Marty


-- 
Marty

[DISCUSS] Top & Sampling Rates (kvm.resource.LibvirtComputingResource)

Reply via email to