Juliet Kim <juli...@linux.vnet.ibm.com> writes: > The commit > (“powerpc/rtas: Fix a potential race between CPU-Offline & Migration) > attempted to fix a hang in Live Partition Mobility(LPM) by abandoning > the LPM attempt if a race between LPM and concurrent CPU offline was > detected. > > However, that fix failed to notify Hypervisor that the LPM attempted > had been abandoned which results in a system hang.
It is surprising to me that leaving a migration unterminated would cause Linux to hang. Can you explain more about how that happens? > Fix this by sending a signal PHYP to cancel the migration, so that PHYP > can stop waiting, and clean up the migration. This is well-spotted and rtas_ibm_suspend_me() needs to signal cancellation in several error paths. But I don't agree that this is one of them: this race is going to be a temporary condition in any production setting, and retrying would allow the migration to succeed.