Hello, Instead of commenting 'echo b > /proc/sysrq-trigger' and also disabling your HA at the same time, perhaps there's a way to tweak the timeouts to be more generous with lazy NFS servers.
Can you go through the logs and see what is happening before the reboot? I am not sure exactly which timeout the script cares about, worth investigating. Lucian -- Sent from the Delta quadrant using Borg technology! Nux! www.nux.ro ----- Original Message ----- > From: "Andrija Panic" <andrija.pa...@gmail.com> > To: dev@cloudstack.apache.org > Sent: Friday, 9 October, 2015 10:25:05 > Subject: Re: slow nfs = reboot all hosts ((( > I managed this problem the folowing way: > http://admintweets.com/cloudstack-disable-agent-rebooting-kvm-host/ > > Cheers > On Oct 9, 2015 10:21 AM, "Andrei Mikhailovsky" <and...@arhont.com> wrote: > >> Hello >> >> My issue is whenever my nfs server becomes slow to respond, ACS just >> bloody reboots ALL hosts servers, not just the once running vms with >> volumes attached to the slow nfs server. Recently, i've decided to remove >> some of the old snapshots to free up some disk space. I've deleted about a >> dozen snapshots and I was monitoring the nfs server for progress. At no >> point did the nfs server lost the connectivity, it just became a bit slow >> and under load. By slow I mean i was still able to list files on the nfs >> mount point and the ssh session was still working okay. It was just taking >> a few more seconds to respond when it comes to nfs file listings, creation, >> deletion, etc. However, the ACS agent has just rebooted every single host >> server, killing all running guests and system vms. In my case, I only have >> two guests with volumes on the nfs server. The rest of the vms are running >> off rbd storage. Yet, all host servers were rebooted, even those which were >> not running guests with nfs volumes. >> >> Ever since i've started using ACS, it was always pretty dumb in correctly >> determining if the nfs storage is still alive. I would say it has done the >> maniac reboot everything type of behaviour at least 5 times in the past 3 >> years. So, in the previous versions of ACS i've just modified the >> kvmheartbeat.sh and hashed out the line with "reboot" as these reboots were >> just pissing everyone off. >> >> After upgrading to ACS 4.5.x that script has no reboot command and I was >> wondering if it is still possible to instruct the kvmheartbeat script not >> to reboot the host servers? >> >> Thanks for your advice. >> >> Andrei