> Denis Kanchev via pve-devel <pve-devel@lists.proxmox.com> hat am 21.05.2025 > 15:13 CEST geschrieben: > Hello, > > We had an issue with a customer migrating a VM between nodes using our > shared storage solution. > > On the target host the OOM killer killed the main migration process, but > the child process (which actually performs the migration) kept on > working, which we did not expect, and that caused some issues.
could you be more specific which process got killed? when you do a migration, a task worker is forked and its UPID is returned to the caller for further querying. as part of the migration, other processes get spawned: - ssh tunnel to the target node - storage migration processes (on both nodes) - VM state management CLI calls (on the target node) which of those is the "main migration process"? which is the child process? > This leads us to the broader question - after a request is submitted, > the parent can be terminated, and not return a response to the client, > while the work is being done, and the request can be wrongly retried or > considered unfinished. the parent should return almost immediately, as all it is doing at that point is returning the UPID to the client (the process then continues to do other work though, but that is no longer related to this task). the only exception is for "sync" task workers, like in a CLI context, where the "parent" has no other work to do, so it waits for the child/task to finish and prints its output while doing so, and some "bulk action" style API calls that fork multiple task workers and poll them themselves. > Should the child processes terminate together with the parent to guard > against this, or is this expected behavior? the parent (API worker process) and child (task worker process) have no direct relation after the task worker has been spawned. > Here is an example patch to do this: > > > diff --git a/src/PVE/RESTEnvironment.pm b/src/PVE/RESTEnvironment.pm > > index bfde7e6..744fffc 100644 > > --- a/src/PVE/RESTEnvironment.pm > > +++ b/src/PVE/RESTEnvironment.pm > > @@ -13,8 +13,9 @@ use Fcntl qw(:flock); > > use IO::File; > > use IO::Handle; > > use IO::Select; > > -use POSIX qw(:sys_wait_h EINTR); > > +use POSIX qw(:sys_wait_h EINTR SIGKILL); > > use AnyEvent; > > +use Linux::Prctl qw(set_pdeathsig); > > > use PVE::Exception qw(raise raise_perm_exc); > > use PVE::INotify; > > @@ -549,6 +550,9 @@ sub fork_worker { > > POSIX::setsid(); > > } > > > + # The signal that the calling process will get when its parent dies > > + set_pdeathsig(SIGKILL); that has weird implications with regards to threads, so I don't think that is a good idea.. > > + > > POSIX::close ($psync[0]); > > POSIX::close ($ctrlfd[0]) if $sync; > > POSIX::close ($csync[1]); _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel