Hi, Currently, we can manage VMs via the VM agents. But the services running within VMs are not very easy to be monitored. If we could use nagios/icinga probes from the host to the guest, that would allow us to achieve this.
Lars, Dejan and I have been discussing on this for some time. There have been quite some thoughts on how to implement it. Now we are inclined to a proposal from Lars. Please let me introduce the idea here, and see what you think about it. First, we could add a resource agent class. The RAs belonging to this class wrap around nagois/icinga probes. They can be configured as special monitor operations for the VMs. The behaviors should be like: 1. The special monitor operations start working after the VMs and the services inside are started. 2. Any failure of the monitor operations is treated as the failure of the VM, which triggers the recovery of the VM. Let me show a example: primitive db-vm ocf:heartbeat:VirtualDomain \ params config="db-vm" hypervisor="xen:///" \ ip="192.168.1.122" \ op monitor nagios:ftp interval="30s" params user="test" The "nagios:ftp" specifies which monitor agent is used to monitor the VM. It's an optional attributes group expressing "class/provider/type" of the monitor agent, which defaults to "ocf:heartbeat:VirtualDomain" for this VM (if so, the monitor would be a normal one like we usually configure). We can add more monitors like "nagios:www" type and so on. We can specify particular "params" for a monitor. And the "ip" is actually not a useful parameter for the VirtualDomain, we put it there for its monitor operations to inherit, so that we don't have to specify for each monitor respectively. Other issues: - As we can see, there's some time window between when the VM is started, but prior to the monitored service starting. A solution is adding a "first-failure" flag for the monitor operation, which could allow us to ignore the *first* failures of a monitor until it has returned healthy once, unless the time is out. Ideally, it could be handled in LRM. - A limitation is we would have to specify different monitor interval values for the services within a VM. Probably we could fix it in some way finally. Anyway, this's the most straightforward solution we can think of so far (Please correct me if I'm missing anything). It's open for discussion. Any comments and suggestions are welcome and appreciated. Thanks, Gao,Yan -- Gao,Yan <y...@suse.com> Software Engineer China Server Team, SUSE. _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org