* Felipe Franciosi (fel...@nutanix.com) wrote: > Hi David, > > > On Sep 30, 2019, at 3:29 PM, Dr. David Alan Gilbert <dgilb...@redhat.com> > > wrote: > > > > * Felipe Franciosi (fel...@nutanix.com) wrote: > >> Heyall, > >> > >> We have a use case where a host should self-fence (and all VMs should > >> die) if it doesn't hear back from a heartbeat within a certain time > >> period. Lots of ideas were floated around where libvirt could take > >> care of killing VMs or a separate service could do it. The concern > >> with those is that various failures could lead to _those_ services > >> being unavailable and the fencing wouldn't be enforced as it should. > >> > >> Ultimately, it feels like Qemu should be responsible for this > >> heartbeat and exit (or execute a custom callback) on timeout. > > > > It doesn't feel doing it inside qemu would be any safer; something > > outside QEMU can forcibly emit a kill -9 and qemu *will* stop. > > The argument above is that we would have to rely on this external > service being functional. Consider the case where the host is > dysfunctional, with this service perhaps crashed and a corrupt > filesystem preventing it from restarting. The VMs would never die.
Yeh that could fail. > It feels like a Qemu timer-driven heartbeat check and calls abort() / > exit() would be more reliable. Thoughts? OK, yes; perhaps using a timer_create and telling it to send a fatal signal is pretty solid; it would take the kernel to do that once it's set. IMHO the safer way is to kick the host off the network by reprogramming switches; so even if the qemu is actually alive it can't get anywhere. Dave > Felipe > > > > >> Does something already exist for this purpose which could be used? > >> Would a generic Qemu-fencing infrastructure be something of interest? > > Dave > > > > > >> Cheers, > >> F. > >> > > -- > > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK