On 2024-12-02 03:53, Jan Beulich wrote:
On 28.11.2024 17:45, Anthony PERARD wrote:
On Tue, Nov 26, 2024 at 12:19:40PM -0500, Jason Andryuk wrote:
When a VM transitioned to LIBXL_SHUTDOWN_REASON_SUSPEND, the xl daemon
was exiting as 0 = DOMAIN_RESTART_NONE "No domain restart".
Later, when the VM actually shutdown, the missing xl daemon meant the
domain wasn't cleaned up properly.

Add a new DOMAIN_RESTART_SUSPENDED to handle the case.  The xl daemon
keeps running to react to future shutdown events.

The domain death event needs to be re-enabled to catch subsequent
events.  The libxl_evgen_domain_death is moved from death_list to
death_reported, and then it isn't found on subsequent iterations through
death_list.  We enable the new event before disabling the old event, to
keep the xenstore watch active.  If it is unregistered and
re-registered, it'll fire immediately for our suspended domain which
will end up continuously re-triggering.

Signed-off-by: Jason Andryuk <jason.andr...@amd.com>

Reviewed-by: Anthony PERARD <anthony.per...@vates.tech>

While committing I was wondering: Does this want/need backporting (and hence
was it perhaps lacking a Fixes: tag)?

Thanks, Jan.

I don't think it's really worth backporting. Mainly, it hasn't been an issue in the last 14 years. A Linux domU doesn't suspend itself - it only does so in response to a xenstore watch. A domU *could* suspend itself without the xenstore watch, but that doesn't seem to happen in practice. Since xl has not been able to generate those xenstore events prior to the `xl suspend` introduction, this code path hasn't run or been an issue.

The tag would be:
Fixes: 1a0e17891f ("xl: support on_{poweroff,reboot,crash} domain configuration options.")

Regards,
Jason

Reply via email to