Hi, I have been looking into how to add process/service monitoring to tripleo. Here I want to be able to detect when an openstack dependent component that is deployed on an instance has failed. And when a failure has occurred I want to be notified and eventually see it in Tuskar.
Ceilometer doesn't handle this particular use case today. So I have been doing some research and there are many options out there that provides process checks: nagios, sensu, zabbix, and monit. I am a bit wary of pulling one of these options into tripleo. There is some increased operational and maintenance costs when pulling in each of them. And physical device monitoring is currently in the works for Ceilometer lessening the need for some of the other abilities that an another monitoring tool would provide. For the particular use case of monitoring processes/services, at a high level, I am considering writing a simple daemon to perform the check. Checks and failures are written out as messages to the notification bus. Interested parties like Tuskar or Ceilometer can subscribe to these messages. In general does this sound like a reasonable approach? There is also the question of how to configure or figure out which processes we are interested in monitoring. I need to do more research here but I'm considering either looking at the elements listed by diskimage-builder or by looking at the orc post-configure.d scripts to find service that are restarted. I welcome your feedback and suggestions. - Richard Su _______________________________________________ OpenStack-dev mailing list [email protected] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
