Hi Andrew, On 11/08/12 13:09, Andrew Beekhof wrote: > On Tue, Nov 6, 2012 at 10:30 PM, Gao,Yan <y...@suse.com> wrote: >> Hi, >> >> Currently, we can manage VMs via the VM agents. But the services running >> within VMs are not very easy to be monitored. If we could use >> nagios/icinga probes from the host to the guest, that would allow us to >> achieve this. >> >> Lars, Dejan and I have been discussing on this for some time. There have >> been quite some thoughts on how to implement it. Now we are inclined to >> a proposal from Lars. Please let me introduce the idea here, and see >> what you think about it. >> >> First, we could add a resource agent class. The RAs belonging to this >> class wrap around nagois/icinga probes. They can be configured as >> special monitor operations for the VMs. The behaviors should be like: >> >> 1. The special monitor operations start working after the VMs and the >> services inside are started. >> >> 2. Any failure of the monitor operations is treated as the failure of >> the VM, which triggers the recovery of the VM. >> >> Let me show a example: >> >> primitive db-vm ocf:heartbeat:VirtualDomain \ >> params config="db-vm" hypervisor="xen:///" \ >> ip="192.168.1.122" \ >> op monitor nagios:ftp interval="30s" params user="test" >> >> The "nagios:ftp" specifies which monitor agent is used to monitor the >> VM. It's an optional attributes group expressing "class/provider/type" >> of the monitor agent, which defaults to "ocf:heartbeat:VirtualDomain" >> for this VM (if so, the monitor would be a normal one like we usually >> configure). We can add more monitors like "nagios:www" type and so on. > > What do you propose the XML should look like? Should be like: ... <op id="vm-monitor-30" name="monitor" class="nagios" type="ftp" interval="30s" ignore-first-failures="true"> <instance_attributes id="vm-monitor-30-params"> <nvpair id="vm-monitor-30-params" name="user" value="test"> </instance_attributes> </op> ...
> >> We can specify particular "params" for a monitor. And the "ip" is >> actually not a useful parameter for the VirtualDomain, we put it there >> for its monitor operations to inherit, so that we don't have to specify >> for each monitor respectively. > > You plan to add 'ip' to the VirtualDomain metadata? It should be in the metatdata of nagios:ftp and also other monitor agents. We'd like parameters inheritance to avoid configuration repetition. > >> >> >> Other issues: >> - As we can see, there's some time window between when the VM is >> started, but prior to the monitored service starting. A solution is >> adding a "first-failure" flag for the monitor operation, which could >> allow us to ignore the *first* failures of a monitor until it has >> returned healthy once, unless the time is out. Ideally, it could be >> handled in LRM. > > What happens if there is never a first success? > The cluster will never find out. It'll reach the timeout and return. We should give a reasonable monitor timeout I think. > >> >> - A limitation is we would have to specify different monitor interval >> values for the services within a VM. Probably we could fix it in some >> way finally. >> >> >> Anyway, this's the most straightforward solution we can think of so far >> (Please correct me if I'm missing anything). It's open for discussion. >> Any comments and suggestions are welcome and appreciated. > > Doesn't look too bad. Some finer points to discuss but I'm sure we > can reach agreement. Nice, thanks! Regards, Gao,Yan -- Gao,Yan <y...@suse.com> Software Engineer China Server Team, SUSE. _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org