On Mon, Oct 09, 2017 at 06:11:36PM -0300, Daniel Henrique Barboza wrote: > LMB removal is completed only when the spapr_lmb_release callback > is called after all DRCs of the dimm are detached. During this > time, it is possible that a unplug request for the same dimm > arrives, trying to detach DRCs that were detached by the guest > in the first unplug_request. > > BQL doesn't help in this case - the lock will prevent any concurrent > removal from happening until the end of spapr_memory_unplug_request > only. What happens is that the second unplug_request ends up calling > spapr_drc_detach in a DRC that were detached already, causing an > assert error in spapr_drc_detach (e.g > https://bugs.launchpad.net/qemu/+bug/1718118). > > spapr_lmb_release uses a structure called sPAPRDIMMState, stored in the > spapr->pending_dimm_unplugs QTAIL, to track how many LMB DRCs are left > to be detached by the guest. When there are no more DRCs left, this > structure is deleted and the pc-dimm unplug handler is called to > finish the process. > > This patch reuses the sPAPRDIMMState to allow unplug_request to know > if there is an ongoing unplug process for a given dimm, aborting the > unplug request in this case, by doing the following changes: > > - in spapr_lmb_release callback, move the dimm state removal to the > end, after pc-dimm unplug handler. With this change we can check for > the existence of the dimm state to see if the unplug process is > done. > > - use spapr_pending_dimm_unplugs_find in spapr_memory_unplug_request > to check if the dimm state exists. If positive, there is an unplug > operation already in progress for this dimm, meaning that we should > abort it and warn the user about it. > > Fixes: https://bugs.launchpad.net/qemu/+bug/1718118 > Signed-off-by: Daniel Henrique Barboza <danie...@linux.vnet.ibm.com>
Thanks for the updated description. I've applied to ppc-for-2.11. I'm not totally certain the first change is necessary - because each of the pieces here should be called with the BQL, I think releasing the DIMMState should be atomic with the rest of the stuff in spapr_lmb_release(). But it makes it more obvious what's going on here, so that's fine anyway. > --- > hw/ppc/spapr.c | 16 ++++++++++++++-- > 1 file changed, 14 insertions(+), 2 deletions(-) > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > index ff87f155d5..7f9274d57d 100644 > --- a/hw/ppc/spapr.c > +++ b/hw/ppc/spapr.c > @@ -3054,14 +3054,13 @@ void spapr_lmb_release(DeviceState *dev) > return; > } > > - spapr_pending_dimm_unplugs_remove(spapr, ds); > - > /* > * Now that all the LMBs have been removed by the guest, call the > * pc-dimm unplug handler to cleanup up the pc-dimm device. > */ > pc_dimm_memory_unplug(dev, &spapr->hotplug_memory, mr); > object_unparent(OBJECT(dev)); > + spapr_pending_dimm_unplugs_remove(spapr, ds); > } > > static void spapr_memory_unplug_request(HotplugHandler *hotplug_dev, > @@ -3090,6 +3089,19 @@ static void spapr_memory_unplug_request(HotplugHandler > *hotplug_dev, > goto out; > } > > + /* > + * An existing pending dimm state for this DIMM means that there is an > + * unplug operation in progress, waiting for the spapr_lmb_release > + * callback to complete the job (BQL can't cover that far). In this case, > + * bail out to avoid detaching DRCs that were already released. > + */ > + if (spapr_pending_dimm_unplugs_find(spapr, dimm)) { > + error_setg(&local_err, > + "Memory unplug already in progress for device %s", > + dev->id); > + goto out; > + } > + > spapr_pending_dimm_unplugs_add(spapr, nr_lmbs, dimm); > > addr = addr_start; -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
signature.asc
Description: PGP signature