Re: slow nfs = reboot all hosts (((

Andrei Mikhailovsky Fri, 09 Oct 2015 08:48:06 -0700

Thanks guys, I am not sure how i've missed that. probably the coffee didn't 
kick in yet )))


Anyway, am I right in saying that now the host server reboot is now forced 
without stopping the services, unmounting filesystems with potentially open and 
unsync-ed data, etc? 

Isn't this rather bad and dangerous to perform simply because of 
slow/unresponsive one of possibly many nfs servers? Not only that, the 
heartbeat also reboot the servers that are not running vms with nfs volumes? In 
my case it just rebooted every single host server. 

Very worrying indeed. 

Andrei 


----- Original Message -----

From: "Nux!" <[email protected]> 
To: [email protected] 
Sent: Friday, 9 October, 2015 12:58:19 PM 
Subject: Re: slow nfs = reboot all hosts ((( 

Hello, 

Instead of commenting 'echo b > /proc/sysrq-trigger' and also disabling your HA 
at the same time, perhaps there's a way to tweak the timeouts to be more 
generous with lazy NFS servers. 

Can you go through the logs and see what is happening before the reboot? I am 
not sure exactly which timeout the script cares about, worth investigating. 

Lucian 

-- 
Sent from the Delta quadrant using Borg technology! 

Nux! 
www.nux.ro 

----- Original Message ----- 
> From: "Andrija Panic" <[email protected]> 
> To: [email protected] 
> Sent: Friday, 9 October, 2015 10:25:05 
> Subject: Re: slow nfs = reboot all hosts ((( 

> I managed this problem the folowing way: 
> http://admintweets.com/cloudstack-disable-agent-rebooting-kvm-host/ 
> 
> Cheers 
> On Oct 9, 2015 10:21 AM, "Andrei Mikhailovsky" <[email protected]> wrote: 
> 
>> Hello 
>> 
>> My issue is whenever my nfs server becomes slow to respond, ACS just 
>> bloody reboots ALL hosts servers, not just the once running vms with 
>> volumes attached to the slow nfs server. Recently, i've decided to remove 
>> some of the old snapshots to free up some disk space. I've deleted about a 
>> dozen snapshots and I was monitoring the nfs server for progress. At no 
>> point did the nfs server lost the connectivity, it just became a bit slow 
>> and under load. By slow I mean i was still able to list files on the nfs 
>> mount point and the ssh session was still working okay. It was just taking 
>> a few more seconds to respond when it comes to nfs file listings, creation, 
>> deletion, etc. However, the ACS agent has just rebooted every single host 
>> server, killing all running guests and system vms. In my case, I only have 
>> two guests with volumes on the nfs server. The rest of the vms are running 
>> off rbd storage. Yet, all host servers were rebooted, even those which were 
>> not running guests with nfs volumes. 
>> 
>> Ever since i've started using ACS, it was always pretty dumb in correctly 
>> determining if the nfs storage is still alive. I would say it has done the 
>> maniac reboot everything type of behaviour at least 5 times in the past 3 
>> years. So, in the previous versions of ACS i've just modified the 
>> kvmheartbeat.sh and hashed out the line with "reboot" as these reboots were 
>> just pissing everyone off. 
>> 
>> After upgrading to ACS 4.5.x that script has no reboot command and I was 
>> wondering if it is still possible to instruct the kvmheartbeat script not 
>> to reboot the host servers? 
>> 
>> Thanks for your advice. 
>> 
>> Andrei

Re: slow nfs = reboot all hosts (((

Reply via email to