Confirmation about the metrics of Hardware agent (Baremetal agent)
=========================================

It is collecting:
- cpu, memoryspace, diskspace, network traffic (the same agent will be running on all services, collecting the same data)

It should be running on:
- the physical servers on which Glance, Cinder, Quantum, Swift, Nova compute node and Nova controller runs - the network devices used in the OpenStack environment (switches, firewalls ...)

Supported metrics
------------------------

* CPU utilisation for each CPU (percentage) (as cpu.util.1min, cpu.util.5min, cpu.util.15min )
* RAM utilisation (GB) (as memory.size.total, memory.size.used )
* Disk utilisation (GB) (as disk.size.total, disk.size.used)
* Incoming traffic for each NIC (Mbps) (as network.incoming.bytes)
* Outgoing traffic for each NIC (Mbps) (as network.outgoing.bytes)
- also track network.outgoing.errors, network.bandwidth.bytes
* Swap utilisation (GB)
- this should be part of Disk utilisation, we will just have to recognize the swap disk * Number of currently running instances and the associated flavours(Ceilometer-Nova using instance:<type> and group_by resource_id) - This info will be queried from Overcloud Ceilometer

Missing metrics
--------------------
* System load -- see /proc/loadavg (percentage)

as described here https://blueprints.launchpad.net/ceilometer/+spec/monitoring-physical-devices




On 09/16/2013 04:10 PM, Ladislav Smola wrote:
Hello,

this is follow up of T.Sedovic old email, trying to identify all metrics, we will need to track for Tuskar. The Ceilometer API for Horizon is now in progress, so we have time to finish the list of metrics and alarms we need. That may also raise the requests for some Ceilometer API optimization

This is meant for the open conversation, that will lead to the final list.


Measurements
=========

The old list sent by tsedovic:
-------------------------------------

* CPU utilisation for each CPU (percentage) (Ceilometer-Nova as cpu_util)
* RAM utilisation (GB) (Ceilometer-Nova as memory)
- I do just assume, this is the used value and total value can be got from the service itself,
  needs confirmation
* Swap utilisation (GB) (Ceilometer-Nova as disk.ephemeral.size)
- I do just assume, this is the used value and total value can be got from the service itself,
  needs confirmation
* Disk utilisation (GB) (Ceilometer-Cinder as volume.size and Ceilometer-Swift as storage.objects.size) - I do just assume, this is the used value and total value can be got from the service itself,
  needs confirmation
* System load -- see /proc/loadavg (percentage) (--)
* Incoming traffic for each NIC (Mbps) ( Ceilometer-Nova as network.incoming.bytes) * Outgoing traffic for each NIC (Mbps) (Ceilometer-Nova as network.outgoing.bytes) - It is connected to VM interface now, I do expect Baremetal agent(Hardware agent) will use NICs,
  needs confirmation
* Number of currently running instances and the associated flavours(Ceilometer-Nova
  using instance:<type> and group_by resource_id)


The additional meters used in wireframes
-----------------------------------------------------

jcoufal could you add the additional measurements from the last wireframes?


The measurements the Ceilometer supports now
---------------------------------------------------------------

http://docs.openstack.org/developer/ceilometer/measurements.html

Feel free to include the others into wireframes jcoufal (I guess there will have to be different overview pages for different Resource Classes, based on their service type)

I am in the process of finding out, whether all off this measurements will be also collected by the Baremetal agent(Hardware agent). But I would say yes, from the description it has (except the VM
specific metrics like vcpusI guess)

The missing meters
-------------------------

We will have to probably implement these (meaning implementing a pollsters for the Baremetal
agent(Hardware agent), that will collect these metrics)

* System load -- see /proc/loadavg (percentage) (probably for all services?)

- Please add other Baremetal metrics you think we will need.


Alerts
====

Setting and Alarm
-----------------------

Simplified explanation of setting the alarm:
In order to have alerts, you have to set an alarm first. Alarm can contain any statistics query, a threshold and an operator. (e.g. fire alarm when avg cpu_util > 90% on all instances of project_1). We can combine more alarms into one complex alarm. And you can browse alarms.
(There can be actions set up on alarm, but more about that later.)

Showing alerts
-------------------

1. I would be bold enough to distinguish system-meter (e.g. similar to cpu_util > 90%, are used for Heat autoscaling). And user-defined-meter (the ones defined in UI). Will we show both in the UI? Probably in different sections. System meters will require extra caution.

2. For the table view of alarms, I would see it as a general filterable order-able table of alarms. So we can easily show something like e.g. all nova alarms, all alarms for cpu_util with condition > 90%

3. Now there is a ongoing conversation with eglynn, how to show the 'aggregate alarms stats'
and 'alarm time series':
https://wiki.openstack.org/wiki/Ceilometer/blueprints/alarm-audit-api-group-by#Discussion Next to the overview page with predefined charts, we should have a general filterable order-able
charts (the similar interface as table view above).

Here is pictured a one possible way of how the charts for Alarms could look like on the overview page: ( http://file.brq.redhat.com/~jcoufal/openstack-m/user_stories/racks_detail-overview.pdf <http://file.brq.redhat.com/%7Ejcoufal/openstack-m/user_stories/racks_detail-overview.pdf>) . Any feedback is welcome. Also we should figure out what Alarms will be used for defining e.g. there is something bad happening (like health chart?). Or what alarms to set and show as default (lot of them
is already being set by e.g. Heat)

4. There is a load of alerts used in wireframes, that are not currently supported in Ceilometer (alerts can be only based on existing measurements), like instances failures, disk failures, etc... We should write those down and probably write agents and pollsters for them. It make sense to integrate them to Ceilometer,
whatever they will be.


Dynamic Ceilometer
============

Due to the dynamic architecture of the ceilometer, any user can actually add his own agent or pollster and that will give him new metrics. We should count with that, when showing charts of alarms or stats, it should
not be hardcoded.

E.g. user will define his own alarm (maybe of his own metrics) and want to build a health chart from this alarm on his Overview page. So there should be only default overview pages, that can be modified and reset back to default. That way user himself can define e.g. bad behaviour, he wants to show.

Though this seems more like a future's future, we should think about it at least a bit.



_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to