Thanks Slavka, your proposal LGTM.

I think the scenario you described deserves a Github issue for fixing when this 
setting is enabled.

Regards,
Nicolas Vazquez


From: Daan Hoogland <daan.hoogl...@gmail.com>
Date: Friday, 10 January 2025 at 05:30
To: dev@cloudstack.apache.org <dev@cloudstack.apache.org>
Cc: users <us...@cloudstack.apache.org>
Subject: Re: [DISCUSS]to disable the agent property setting 
`reboot.host.and.alert.management.on.heartbeat.timeout` by default
your proposal seems sensible for the most part Slavka, There is just one
concern I have with this

Such behaviour can create several problems. If the primary storage is
> temporarily not accessible, all hosts could reboot.


VMs become dysfunctional if their storage has been inaccessible and becomes
read-only. For most this can be solved by rebooting and or restarting
networks with cleanup. For ssvm and cpvm it would be nice if there is some
automatic way of restarting them and the host reboot would cause that. As
this is a per host setting, does it make sense to set it for those where we
see such a svm is running?

Just a thought, not pertinent to your proposal I think.


On Fri, Jan 10, 2025 at 8:00 AM Slavka Peleva <slav...@storpool.com.invalid>
wrote:

> Hello everyone,
>
> I would like to open a discussion to change the default value of the agent
> property `reboot.host.and.alert.management.on.heartbeat.timeout` to false.
>
> The default behaviour of the kvm agent is to check the storage heartbeat
> and, if it timeouts, to check the
> `reboot.host.and.alert.management.on.heartbeat.timeout` and restart the
> host if it is true. The default value of this property is true.
>
> This behaviour is independent of the host HA setting in management, and the
> agent will reboot the host even if the host HA is not enabled.
>
> Such behaviour can create several problems. If the primary storage is
> temporarily not accessible, all hosts could reboot.
>
> Another issue with HCI deployments is that if there is a temporary issue
> with the storage or with the heartbeat check, this will cause a cyclic
> reboot of all hosts, preventing the cluster from restoring its operational
> state.
>
> Note that this parameter is not part of the host HA mechanism. The
> CloudStack management server has other mechanisms to reboot and fence the
> host in case host HA is enabled.
>
> Self-rebooting the host by the agent has very specific use cases, if any,
> and is not suitable for the typical setups. Thus, the proposal is to change
> the default value to false and leave it to the user to enable the agent to
> reboot the host only explicitly. The proposal is expected to improve the
> overall availability of deployed CloudStack clouds.
>
> Please let me know your thoughts about the proposal.
>
> Best regards,
>
> Slavka
>


--
Daan

 

Reply via email to