Re: [openstack-dev] [Tuskar] All needed Tuskar metrics and alerts mapped to what Ceilometer supports

Ladislav Smola Tue, 17 Sep 2013 05:30:15 -0700

Confirmation about the metrics of Hardware agent (Baremetal agent)
=========================================


It is collecting:

- cpu, memoryspace, diskspace, network traffic (the same agent will berunning on all services, collecting the same data)


It should be running on:

- the physical servers on which Glance, Cinder, Quantum, Swift, Novacompute node and Nova controller runs- the network devices used in the OpenStack environment (switches,firewalls ...)


Supported metrics
------------------------

* CPU utilisation for each CPU (percentage) (as cpu.util.1min,cpu.util.5min, cpu.util.15min )

* RAM utilisation (GB) (as memory.size.total, memory.size.used )
* Disk utilisation (GB) (as disk.size.total, disk.size.used)
* Incoming traffic for each NIC (Mbps) (as network.incoming.bytes)
* Outgoing traffic for each NIC (Mbps) (as network.outgoing.bytes)
- also track network.outgoing.errors, network.bandwidth.bytes
* Swap utilisation (GB)

- this should be part of Disk utilisation, we will just have torecognize the swap disk* Number of currently running instances and the associatedflavours(Ceilometer-Novausing instance:<type> and group_by resource_id) - This info will bequeried from Overcloud Ceilometer


Missing metrics
--------------------
* System load -- see /proc/loadavg (percentage)

as described herehttps://blueprints.launchpad.net/ceilometer/+spec/monitoring-physical-devices





On 09/16/2013 04:10 PM, Ladislav Smola wrote:

Hello,
this is follow up of T.Sedovic old email, trying to identify allmetrics, we will need to track for Tuskar.The Ceilometer API for Horizon is now in progress, so we have time tofinish the list of metricsand alarms we need. That may also raise the requests for someCeilometer API optimization
This is meant for the open conversation, that will lead to the final list.


Measurements
=========

The old list sent by tsedovic:
-------------------------------------

* CPU utilisation for each CPU (percentage) (Ceilometer-Nova as cpu_util)
* RAM utilisation (GB) (Ceilometer-Nova as memory)
- I do just assume, this is the used value and total value can be gotfrom the service itself,
  needs confirmation
* Swap utilisation (GB) (Ceilometer-Nova as disk.ephemeral.size)
- I do just assume, this is the used value and total value can be gotfrom the service itself,
  needs confirmation
* Disk utilisation (GB) (Ceilometer-Cinder as volume.size andCeilometer-Swift as storage.objects.size)- I do just assume, this is the used value and total value can be gotfrom the service itself,
  needs confirmation
* System load -- see /proc/loadavg (percentage) (--)
* Incoming traffic for each NIC (Mbps) ( Ceilometer-Nova asnetwork.incoming.bytes)* Outgoing traffic for each NIC (Mbps) (Ceilometer-Nova asnetwork.outgoing.bytes)- It is connected to VM interface now, I do expect Baremetalagent(Hardware agent) will use NICs,
  needs confirmation
* Number of currently running instances and the associatedflavours(Ceilometer-Nova
  using instance:<type> and group_by resource_id)


The additional meters used in wireframes
-----------------------------------------------------
jcoufal could you add the additional measurements from the lastwireframes?
The measurements the Ceilometer supports now
---------------------------------------------------------------

http://docs.openstack.org/developer/ceilometer/measurements.html
Feel free to include the others into wireframes jcoufal (I guess therewill have to be differentoverview pages for different Resource Classes, based on their servicetype)
I am in the process of finding out, whether all off this measurementswill be also collected by theBaremetal agent(Hardware agent). But I would say yes, from thedescription it has (except the VM
specific metrics like vcpusI guess)

The missing meters
-------------------------
We will have to probably implement these (meaning implementing apollsters for the Baremetal
agent(Hardware agent), that will collect these metrics)
* System load -- see /proc/loadavg (percentage) (probably for allservices?)
- Please add other Baremetal metrics you think we will need.


Alerts
====

Setting and Alarm
-----------------------

Simplified explanation of setting the alarm:
In order to have alerts, you have to set an alarm first. Alarm cancontain any statistics query,a threshold and an operator. (e.g. fire alarm when avg cpu_util > 90%on all instances of project_1).We can combine more alarms into one complex alarm. And you can browsealarms.
(There can be actions set up on alarm, but more about that later.)

Showing alerts
-------------------
1. I would be bold enough to distinguish system-meter (e.g. similar tocpu_util > 90%, are usedfor Heat autoscaling). And user-defined-meter (the ones defined inUI). Will we show both inthe UI? Probably in different sections. System meters will requireextra caution.
2. For the table view of alarms, I would see it as a generalfilterable order-able table of alarms.So we can easily show something like e.g. all nova alarms, all alarmsfor cpu_util with condition > 90%
3. Now there is a ongoing conversation with eglynn, how to show the'aggregate alarms stats'
and 'alarm time series':
https://wiki.openstack.org/wiki/Ceilometer/blueprints/alarm-audit-api-group-by#DiscussionNext to the overview page with predefined charts, we should have ageneral filterable order-able
charts (the similar interface as table view above).
Here is pictured a one possible way of how the charts for Alarms couldlook like on the overview page:(http://file.brq.redhat.com/~jcoufal/openstack-m/user_stories/racks_detail-overview.pdf<http://file.brq.redhat.com/%7Ejcoufal/openstack-m/user_stories/racks_detail-overview.pdf>).Any feedback is welcome. Also we should figure out what Alarms will beused for defining e.g. there issomething bad happening (like health chart?). Or what alarms to setand show as default (lot of them
is already being set by e.g. Heat)
4. There is a load of alerts used in wireframes, that are notcurrently supported in Ceilometer (alerts canbe only based on existing measurements), like instances failures, diskfailures, etc... We should write thosedown and probably write agents and pollsters for them. It make senseto integrate them to Ceilometer,
whatever they will be.


Dynamic Ceilometer
============
Due to the dynamic architecture of the ceilometer, any user canactually add his own agent or pollster andthat will give him new metrics. We should count with that, whenshowing charts of alarms or stats, it should
not be hardcoded.
E.g. user will define his own alarm (maybe of his own metrics) andwant to build a health chart from this alarmon his Overview page. So there should be only default overview pages,that can be modified and reset backto default. That way user himself can define e.g. bad behaviour, hewants to show.
Though this seems more like a future's future, we should think aboutit at least a bit.
_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Tuskar] All needed Tuskar metrics and alerts mapped to what Ceilometer supports

Reply via email to