Currently we set the initial isolation/allocation state for DRCs associated with coldplugged LMBs to ISOLATED/UNUSABLE, respectively, under the assumption that the guest will move this state to UNISOLATED/USABLE.
In fact, this is only the case for LMBs added via hotplug. For coldplugged LMBs, the guest actually assumes the initial state to be UNISOLATED/USABLE. In practice, this only becomes an issue when we attempt to unplug one of these LMBs, where the guest kernel will issue an rtas-get-sensor-state call to check that the corresponding DRC is in an USABLE state before it will release the LMB back to QEMU. If the returned state is otherwise, the guest will assume no further action is needed, which bypasses the QEMU-side cleanup that occurs during the USABLE->UNUSABLE transition. This results in LMBs and their corresponding pc-dimm devices to stick around indefinitely. This patch fixes the issue by manually setting DRCs associated with cold-plugged LMBs to UNISOLATED/ALLOCATED, but leaving the hotplug state untouched. As it turns out, this is analogous to the handling for cold-plugged CPUs in spapr_core_plug(). Cc: qemu-...@nongnu.org Cc: David Gibson <da...@gibson.dropbear.id.au> Cc: Bharata B Rao <bhar...@linux.vnet.ibm.com> Cc: Greg Kurz <gk...@linux.vnet.ibm.com> Signed-off-by: Michael Roth <mdr...@linux.vnet.ibm.com> --- This patch is a revision to the previously posted series: [PATCH for-2.8 0/3] spapr: fix breakage of memory unplug after migration That patchset introduced DRC migration as a means to address DRC state synchronization between source/target to allow memory unplug after migration. It turns out that unplug was also not working for LMBs that were defined on the source via '-device pc-dimm'. The real issue was that the default coldplug state on *both* the source and target was wrong, and by fixing that default state, we no longer need DRC migration for the scenario mentioned in the previous patchset since, incidentally, coldplug state on the target now matches post-hotplug DRC state on the source. This patch is very late for 2.8, but I'm hoping it can still make it in for rc3/2.8. I've tested forward/backward migration for all possible combinations (other than known case unsupported case 2.8->2.7) of v2.6/v2.7/v2.8, pseries-2.6/pseries-2.7/pseries-2.8, and modern-hotplug-events=true/false, with various combinations of CPU / LMB hotplug/unplug before/after migration. --- hw/ppc/spapr.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index c3269c7..208ef7b 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -2290,6 +2290,11 @@ static void spapr_add_lmbs(DeviceState *dev, uint64_t addr_start, uint64_t size, drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc); drck->attach(drc, dev, fdt, fdt_offset, !dev->hotplugged, errp); addr += SPAPR_MEMORY_BLOCK_SIZE; + if (!dev->hotplugged) { + /* guests expect coldplugged LMBs to be pre-allocated */ + drck->set_allocation_state(drc, SPAPR_DR_ALLOCATION_STATE_USABLE); + drck->set_isolation_state(drc, SPAPR_DR_ISOLATION_STATE_UNISOLATED); + } } /* send hotplug notification to the * guest only in case of hotplugged memory -- 1.9.1