Copilot commented on code in PR #13090:
URL: https://github.com/apache/cloudstack/pull/13090#discussion_r3172805548
##########
agent/conf/agent.properties:
##########
@@ -310,6 +310,22 @@ iscsi.session.cleanup.enabled=false
# This parameter specifies if the host must be rebooted when something goes
wrong with the heartbeat.
#reboot.host.and.alert.management.on.heartbeat.timeout=true
+# Action taken by kvmheartbeat.sh / kvmspheartbeat.sh when a storage heartbeat
+# write fails persistently. Supersedes the legacy binary
+# 'reboot.host.and.alert.management.on.heartbeat.timeout' when set to a
non-default value.
+#
+# Allowed values:
+# reboot - immediate sysrq-trigger reboot (default; original
behavior)
+# graceful-reboot - 'systemctl reboot' instead of sysrq; allows VMs to stop
cleanly
+# restart-agent - restart cloudstack-agent only; running VMs are preserved
+# log-only - log + alert; take no automatic action (admin must
investigate)
Review Comment:
The `log-only` description says “log + alert”, but the heartbeat scripts
only write to syslog via `logger` and exit 0; there is no explicit
management-server alerting performed by this action. Consider rewording to
avoid implying an alert will be generated (or document exactly what “alert”
means here).
##########
agent/conf/agent.properties:
##########
@@ -310,6 +310,22 @@ iscsi.session.cleanup.enabled=false
# This parameter specifies if the host must be rebooted when something goes
wrong with the heartbeat.
#reboot.host.and.alert.management.on.heartbeat.timeout=true
+# Action taken by kvmheartbeat.sh / kvmspheartbeat.sh when a storage heartbeat
+# write fails persistently. Supersedes the legacy binary
+# 'reboot.host.and.alert.management.on.heartbeat.timeout' when set to a
non-default value.
+#
Review Comment:
The comment says this setting “supersedes”
`reboot.host.and.alert.management.on.heartbeat.timeout`, but in `KVMHAMonitor`
that boolean still gates whether the heartbeat script is invoked in fence mode
at all. If the boolean is set to `false`, changing `kvm.heartbeat.fence.action`
has no effect. Please reword to clarify the precedence/interaction (e.g., the
boolean remains a global bypass, and this setting only changes the action when
fencing is enabled).
##########
agent/src/main/java/com/cloud/agent/properties/AgentProperties.java:
##########
@@ -598,6 +598,25 @@ public class AgentProperties{
public static final Property<Boolean>
REBOOT_HOST_AND_ALERT_MANAGEMENT_ON_HEARTBEAT_TIMEOUT
= new
Property<>("reboot.host.and.alert.management.on.heartbeat.timeout", true);
+ /**
+ * Action taken by the KVM agent's storage heartbeat scripts
(kvmheartbeat.sh / kvmspheartbeat.sh)
+ * when a heartbeat write fails persistently. Allowed values:
+ * <ul>
+ * <li>{@code reboot} (default) — immediate sysrq-trigger reboot;
original behavior</li>
+ * <li>{@code graceful-reboot} — {@code systemctl reboot} instead of
sysrq, lets VMs stop cleanly</li>
+ * <li>{@code restart-agent} — restart cloudstack-agent only; running
VMs preserved</li>
+ * <li>{@code log-only} — log + alert, no automatic action</li>
Review Comment:
The Javadoc lists `log-only` as “log + alert”, but the implementation in the
heartbeat scripts only logs locally (syslog) and exits successfully; it doesn’t
directly generate a management-server alert. Please adjust the wording (or
clarify what alerting mechanism is expected) to avoid misleading operators.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]