why not use systemd and launchd facilities ?
On Wed, Aug 7, 2013 at 8:28 PM, Edison Su <edison...@citrix.com> wrote: > > > > -----Original Message----- > > From: Wido den Hollander [mailto:w...@widodh.nl] > > Sent: Wednesday, August 07, 2013 10:53 AM > > To: dev@cloudstack.apache.org > > Cc: shadow...@gmail.com > > Subject: [KVM] Helper for agent during HA operations > > > > Hi, > > > > In our production setups we have seen some crashes of the KVM agent. > > If we can make sure KVM agent restarted immediately after crash, then you > don't another separate service running on your KVM host. > Not sure jsvc can automatically restart agent or not, I remember we have a > small c daemon program in the 3.0.x source code, which can monitor agent. > > > This could happen for all kinds of reasons, but that's not what I wanted > to > > discuss. > > > > Also see this issue: https://issues.apache.org/jira/browse/CLOUDSTACK- > > 3954 > > > > What I've been writing for a PoC in our company is a small helper > written in > > Python which runs on port 8251. > > > > The Investigator can query this webservice (attached) which will simply > tell it > > which VMs are running on that host. > > > > It's online here: http://stack01.ceph.widodh.nl:8251/ > > > > You can also do a query like this: > > http://stack01.ceph.widodh.nl:8251/ping/i-2-6570-VM > > > > This way we can more reliably verify if a specific VM is still running > if the > > Agent stops responding for some reason. A ICMP echo-request isn't safe > > since the Security Groups could prevent ICMP from coming through. > > > > I'd rather not have the management server query libvirt directly, since > that > > would open a potential security whole. This webservice is read-only and > on > > my production setups I have libvirt listening on the private bridge only. > > > > What do you think? > > > > Wido > -- Grtz, Jörgen Maas