Re: [Qemu-devel] [QEMU PATCH v5 5/6] migration: spapr: migrate ccs_list in spapr state

David Gibson Sun, 09 Oct 2016 23:05:32 -0700

On Fri, Oct 07, 2016 at 09:52:51AM -0500, Michael Roth wrote:
> Quoting David Gibson (2016-10-06 22:36:07)
> > On Mon, Oct 03, 2016 at 11:24:56AM -0700, Jianjun Duan wrote:
> > > ccs_list in spapr state maintains the device tree related
> > > information on the rtas side for hotplugged devices. In racing
> > > situations between hotplug events and migration operation, a rtas
> > > hotplug event could be migrated from the source guest to target
> > > guest, or the source guest could have not yet finished fetching
> > > the device tree when migration is started, the target will try
> > > to finish fetching the device tree. By migrating ccs_list, the
> > > target can fetch the device tree properly.
> > > 
> > > ccs_list is put in a subsection in the spapr state VMSD to make
> > > sure migration across different versions is not broken.
> > > 
> > > Signed-off-by: Jianjun Duan <du...@linux.vnet.ibm.com>
> > 
> > I'm still not entirely convinced we need to migrate the ccs_list.
> > What would happen if we did this:
> > 
> >    * Keep a flag which indicates whether the guest is in the middle of
> >      the configure_connector process.
> >        - I'm not sure if that would need to be a new bit of state, or
> >          if we could deduce it from the value of the isolation and
> >          allocation states
> >        - If it's new state, we'd need to migrate it, obviously not if
> >          we can derive it from other state flags
> > 
> >    * On the destination during post_load, if there was an in-progress
> >      configure_connector on the source, we set another "stale
> >      configure" flag
> > 
> >    * When a configure_connector call is attempted on the destination
> >      with the stale configure flag set, return an error
> > 
> > The question is, if we choose the right error, can we get the guest to
> > either restart the configure from scratch, or fail gracefully, so the
> > operator can restart the hotplug
> 
> To get the configure to restart, the guest's configure_connector
> implementation would need to changed. Current code in drmgr would just
> bail on any error, and I'd imagine the in-kernel version does the same.
> 
> So at least for existing guests, the only option is failing the command
> at the operator's interface, namely device_add. device_add is
> asynchronous to the actual hotplug event handling however. So if we want
> to convey failure to the user, it would have to be either in the form of
> a new QMP event emitted to convey hotplug success or error, or through
> device_add itself by implementing something like the async QMP rework
> that Marc-Andre posted some time back (which still seems to be a topic
> of debate). Which either approach, something like libvirt, with adequate
> state-tracking for pending hotplug events, could handle an error event
> on the target side post-migrate and convey that to the user somehow.
> 
> That's probably a much larger discussion if it comes to that, but doable
> in theory.
> 
> But even that wouldn't get us totally out of the woods: DRC state can
> still be modified outside of hotplug. For instance, a guest should be
> able to do:
> 
>   drmgr -c pci -s <drc_index> -r
>   drmgr -c pci -s <drc_index> -a
> 
> to return a device to firmware and then later take it back and
> reconfigure it. I'm not aware of any common case where this would occur,
> but it's not disallowed by the specification, and performing a migration
> between these 2 operations would currently break this since the default
> coldplug state on target assumes a configured state in source.


Thanks, that's a good case for why we need this.  Can you please fold
this description into your commit messages so it's there for posterity.

-- 
David Gibson                    | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
                                | _way_ _around_!
http://www.ozlabs.org/~dgibson

signature.asc
Description: PGP signature

Re: [Qemu-devel] [QEMU PATCH v5 5/6] migration: spapr: migrate ccs_list in spapr state

Reply via email to