Stefan,
 If you are looking for network usage data, listUsageRecords API will provide 
more accurate info.

~kishan

-----Original Message-----
From: Stefan Engstrom [mailto:sengst...@ena.com]
Sent: Thursday, July 28, 2016 10:04 AM
To: dev <dev@cloudstack.apache.org>
Subject: Usage data reported by listvirtualmachines incomplete in setup with 
multiple management servers

Hello all - we are standing up a cloud service using CloudStack 4.8 with some 
select PRs included.

We base usage rate reporting on differentials of the data (rates) served by GET 
listvirtualmachines. While validating our production build which has three 
management servers we noticed that this approach is under-reporting network r/w 
and disk r/w volumes (and most likely iops as well).


The test case and diagnostics we looked at today are the following:

1. curl a decent (11.7 GB) payload to disk in a centos 7 VM

2. Check the network transfer on the VM's nic with "ip -s link"

3. Verify the transfer on the ACS host with virsh.

4. Check reporting from the individual management hosts - the *sum* of what 
they report via GET listvirtualmachines matches the volume in steps 1-3 above.


The load balanced IP for the management cluster picks a host at random (and 
have you stick to it) - thus you get a usage number anywhere between 0 and the 
correct value, probably depending on how the start of the individual management 
hosts were staggered on startup.


>From a cursory look at the code we think the agent on the host collects 
>differentials every time it is queried by a management host, reports the diff 
>and the management hosts proceeds to accumulate that. A little backwards 
>perhaps but it explains the behavior outlined above.


So, I wonder how other people deal with this, or even if there is awareness of 
the issue when you have multiple management servers.


My planned work-around for now is to pull the management servers individually 
and simply add up the results, but curious to hear other ideas. Anybody using 
the ACS  usage server or the graphite integration for this kind of reporting?


Thanks for your thoughts,


Stefan Engström

Lead Research & Development Engineer
Education Networks of America

618 Grassmere Park Drive

Suite 12

Nashville, TN 37211

Phone: 615-312-6136
CTAC: 888-612-2880
Video @ https://ena.zoom.us/my/sengstrom
Mobile: 615-500-3223 <= Best option





DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the 
property of Accelerite, a Persistent Systems business. It is intended only for 
the use of the individual or entity to which it is addressed. If you are not 
the intended recipient, you are not authorized to read, retain, copy, print, 
distribute or use this message. If you have received this communication in 
error, please notify the sender and delete all copies of this message. 
Accelerite, a Persistent Systems business does not accept any liability for 
virus infected mails.

Reply via email to