--- Begin Message ---
Hello,
We had an issue with a customer migrating a VM between nodes using our
shared storage solution.
On the target host the OOM killer killed the main migration process, but
the child process (which actually performs the migration) kept on
working, which we did not expect, and that caused some issues.
This leads us to the broader question - after a request is submitted,
the parent can be terminated, and not return a response to the client,
while the work is being done, and the request can be wrongly retried or
considered unfinished.
Should the child processes terminate together with the parent to guard
against this, or is this expected behavior?
Here is an example patch to do this:
diff --git a/src/PVE/RESTEnvironment.pm b/src/PVE/RESTEnvironment.pm
index bfde7e6..744fffc 100644
--- a/src/PVE/RESTEnvironment.pm
+++ b/src/PVE/RESTEnvironment.pm
@@ -13,8 +13,9 @@ use Fcntl qw(:flock);
use IO::File;
use IO::Handle;
use IO::Select;
-use POSIX qw(:sys_wait_h EINTR);
+use POSIX qw(:sys_wait_h EINTR SIGKILL);
use AnyEvent;
+use Linux::Prctl qw(set_pdeathsig);
use PVE::Exception qw(raise raise_perm_exc);
use PVE::INotify;
@@ -549,6 +550,9 @@ sub fork_worker {
POSIX::setsid();
}
+ # The signal that the calling process will get when its parent dies
+ set_pdeathsig(SIGKILL);
+
POSIX::close ($psync[0]);
POSIX::close ($ctrlfd[0]) if $sync;
POSIX::close ($csync[1]);
--- End Message ---
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel