On 2024-12-02 03:53, Jan Beulich wrote:
On 28.11.2024 17:45, Anthony PERARD wrote:
On Tue, Nov 26, 2024 at 12:19:40PM -0500, Jason Andryuk wrote:
When a VM transitioned to LIBXL_SHUTDOWN_REASON_SUSPEND, the xl daemon
was exiting as 0 = DOMAIN_RESTART_NONE "No domain restart".
Later, when the VM actually shutdown, the missing xl daemon meant the
domain wasn't cleaned up properly.
Add a new DOMAIN_RESTART_SUSPENDED to handle the case. The xl daemon
keeps running to react to future shutdown events.
The domain death event needs to be re-enabled to catch subsequent
events. The libxl_evgen_domain_death is moved from death_list to
death_reported, and then it isn't found on subsequent iterations through
death_list. We enable the new event before disabling the old event, to
keep the xenstore watch active. If it is unregistered and
re-registered, it'll fire immediately for our suspended domain which
will end up continuously re-triggering.
Signed-off-by: Jason Andryuk <jason.andr...@amd.com>
Reviewed-by: Anthony PERARD <anthony.per...@vates.tech>
While committing I was wondering: Does this want/need backporting (and hence
was it perhaps lacking a Fixes: tag)?
Thanks, Jan.
I don't think it's really worth backporting. Mainly, it hasn't been an
issue in the last 14 years. A Linux domU doesn't suspend itself - it
only does so in response to a xenstore watch. A domU *could* suspend
itself without the xenstore watch, but that doesn't seem to happen in
practice. Since xl has not been able to generate those xenstore events
prior to the `xl suspend` introduction, this code path hasn't run or
been an issue.
The tag would be:
Fixes: 1a0e17891f ("xl: support on_{poweroff,reboot,crash} domain
configuration options.")
Regards,
Jason