We call this in pve-cluster.service as ExecStartPost. We prefix it
with '-' to tell systemd that it should ignore non-zero exit codes,
but if the command hangs (e.g., on IO) systemd kills it after a
timeout (90 seconds default) which then doesn't get ignored and the
unit will also be put in failure state and stopped.
We specifically do not want this to happen, so wrap the updatecerts
call in run_with_timeout and give it a maximum of 30 seconds to
finish.

Signed-off-by: Thomas Lamprecht <t.lampre...@proxmox.com>
---
 data/PVE/CLI/pvecm.pm | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/data/PVE/CLI/pvecm.pm b/data/PVE/CLI/pvecm.pm
index 3f0ac12..4a6e373 100755
--- a/data/PVE/CLI/pvecm.pm
+++ b/data/PVE/CLI/pvecm.pm
@@ -290,7 +290,9 @@ __PACKAGE__->register_method ({
     code => sub {
        my ($param) = @_;
 
-       PVE::Cluster::updatecerts_and_ssh($param->@{qw(force silent)});
+       PVE::Tools::run_with_timeout(30, sub {
+           PVE::Cluster::updatecerts_and_ssh($param->@{qw(force silent)});
+       });
 
        return undef;
     }});
-- 
2.17.1


_______________________________________________
pve-devel mailing list
pve-devel@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

Reply via email to