Re: [Openstack-operators] Lets talk capacity monitoring

Tim Bell Thu, 15 Jan 2015 22:29:20 -0800

One good topic to try to pin down at the Ops meet up would be how we could do 
the flavour/aggregate/project/hypervisor mappings. We’ve got a local patch for 
some function but it was not possible to get the right way to do it agreed 
(https://blueprints.launchpad.net/nova/+spec/multi-tenancy-isolation-only-aggregates).


We do a fair amount of ‘just how many of flavour X could I accept’ but when we 
add in the various combinations of availability zones and cells, it can be easy 
to run out in one corner of the cloud. There is also the tetris problem where 
one hypervisor has some memory left over, another one has some CPU but you 
can’t combine them (we’d need some hardware support to get that feature 
working…)

Tim

From: matt [mailto:m...@nycresistor.com]
Sent: 16 January 2015 00:30
To: George Shuklin
Cc: openstack-operators@lists.openstack.org
Subject: Re: [Openstack-operators] Lets talk capacity monitoring

I've found histograms to be pretty useful in figuring out patterns during 
sizable time deltas... and anomaly detection there can highlight stuff you 
might want to check out ( ie raise the alert condition on that device ).

example of a histogram i did many many moons ago to track disk sizes from our 
nagios plugin that did dynamic disk free analytics.  I don't have any of the 
animated GIFs I made that showed fluctuations over days... but that was great 
from a human visual sense.

I suppose this could be further automated and refined, I've not been focused 
here anymore though.

-Matt

On Thu, Jan 15, 2015 at 3:08 PM, George Shuklin 
<george.shuk...@gmail.com<mailto:george.shuk...@gmail.com>> wrote:
On 01/15/2015 06:43 PM, Jesse Keating wrote:
We have a need to better manage the various openstack capacities across our 
numerous clouds. We want to be able to detect when capacity of one system or 
another is approaching the point where it would be a good idea to arrange to 
increase that capacity. Be it volume space, VCPU capability, object storage 
space, etc...

What systems are you folks using to monitor and react to such things?

In our case we are using standard metrics (ganglia) and monitoring (shinken). I 
have thoughts about 'capacity planing', but the problem is that you cannot 
separate payload from wasted resources. For example, when snapshot is created, 
it eats space on compute (for some configuration) beyond flavor limits. If 
instance boots, _base is used too (and if instance is booting from big 
snapshot, it use more space in _base, than in /instances). CPU can be heavily 
used by many host-internal processes, and memory is shared with management 
software (which can be greedy too). IO can be overspend on snapshots/booting.

So we are using cumulative graphs for free space, cpu usage, memory usage. It 
does not cover flavor/aggregate/pinning-to-host-by-metadata cases, but overall 
give some feeling about available free resources.


_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org<mailto:OpenStack-operators@lists.openstack.org>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] Lets talk capacity monitoring

Reply via email to