Re: [Qemu-devel] [PATCH] spapr: manage hotplugged devices while the VM is not started

Dr. David Alan Gilbert Wed, 14 Jun 2017 04:59:55 -0700

* Igor Mammedov (imamm...@redhat.com) wrote:
> On Tue, 13 Jun 2017 16:42:45 -0500
> Michael Roth <mdr...@linux.vnet.ibm.com> wrote:
> 
> > Quoting Igor Mammedov (2017-06-09 03:27:33)
> > > On Thu, 08 Jun 2017 15:00:53 -0500
> > > Michael Roth <mdr...@linux.vnet.ibm.com> wrote:
> > >   
> > > > Quoting David Gibson (2017-05-30 23:35:57)  
> > > > > On Tue, May 30, 2017 at 06:04:45PM +0200, Laurent Vivier wrote:    
> > > > > > For QEMU, a hotlugged device is a device added using the HMP/QMP
> > > > > > interface.
> > > > > > For SPAPR, a hotplugged device is a device added while the
> > > > > > machine is running. In this case QEMU doesn't update internal
> > > > > > state but relies on the OS for this part
> > > > > > 
> > > > > > In the case of migration, when we (libvirt) hotplug a device
> > > > > > on the source guest, we (libvirt) generally hotplug the same
> > > > > > device on the destination guest. But in this case, the machine
> > > > > > is stopped (RUN_STATE_INMIGRATE) and QEMU must not expect
> > > > > > the OS will manage it as an hotplugged device as it will
> > > > > > be "imported" by the migration.
> > > > > > 
> > > > > > This patch changes the meaning of "hotplugged" in spapr.c
> > > > > > to manage a QEMU hotplugged device like a "coldplugged" one
> > > > > > when the machine is awaiting an incoming migration.
> > > > > > 
> > > > > > Signed-off-by: Laurent Vivier <lviv...@redhat.com>    
> > > > > 
> > > > > So, I think this is a reasonable concept, at least in terms of
> > > > > cleanliness and not doing unnecessary work.  However, if it's fixing
> > > > > bugs, I suspect that means we still have problems elsewhere.    
> > > > 
> > > > I was hoping a lot of these issues would go away once we default
> > > > the initial/reset DRC states to "coldplugged". I think your pending
> > > > patch:
> > > > 
> > > >   "spapr: Make DRC reset force DRC into known state"
> > > > 
> > > > But I didn't consider the fact that libvirt will be issuing these
> > > > hotplugs *after* reset, so those states would indeed need to
> > > > be fixed up again to reflect boot-time,attached as opposed to
> > > > boot-time,unattached before starting the target.
> > > > 
> > > > So I do think this patch addresses a specific bug that isn't
> > > > obviously fixable elsewhere.
> > > > 
> > > > To me it seems like the only way to avoid doing something like
> > > > what this patch does is to migrate all attached DRCs from the
> > > > source in all cases.
> > > > 
> > > > This would break backward-migration though, unless we switch from
> > > > using subregions for DRCs to explicitly disabling DRC migration
> > > > based on machine type.  
> > > we could leave old machines broken and fix only new machine types,
> > > then it would be easy ot migrate 'additional' DRC state as subsection
> > > only on new for new machines.  
> > 
> > That's an option, but subsections were only really used for backward
> > compatibility. Not sure how much we have to gain from using both.
> If I remember correctly subsections could be/are used for forward compat stuff
> i.e. subsection is generated on source side when .needed callback returns
> true and destinations will just consume whatever data were sent
> without looking at .need callback. So source could generate extra
> DRC subsection when cpu hotplug is enabled for new machine types,
> ex: f816a62daa
> 
> adding David/Juan to CC list to correct me if I'm wrong.


Yes I think that's right; but note that the destination does have to
know about the subsection definition to consume it.
It can't consume a subsection sent by a newer qemu which it doesn't
have a definition of and so can't parse.

Dave

> > > > That approach seems to similar to what x86 does, e.g.
> > > > hw/acpi/ich9.c and hw/acpi/piix.c migrate vmstate_memhp_state
> > > > (corresponding to all DIMMs' slot status) in all cases where
> > > > memory hotplug is enabled. If they were to do this using
> > > > subregions for DIMMs in a transitional state I think similar
> > > > issues would pop up in that code as well.
> > > > 
> > > > Even if we take this route, we still need to explicitly suppress
> > > > hotplug events during INMIGRATE to avoid extra events going on
> > > > the queue. *Unless* we similarly rely purely on the ones sent by
> > > > the source.  
> > > pc/q35 might also lose events if device is hotplugged during migration,
> > > in addition migration would fail anyway since dst qemu
> > > should be launched with all devices that are present on src.
> > > 
> > > ex: consider if one hotplugs DIMM during migration, it creates
> > > RAM region mapped into guest and that region might be transferred
> > > as part of VMState (not sure if it even works)
> > > and considering dst qemu has no idea about hotplugged memory mapping,
> > > the migration would fail on receiving unknown VMState.
> > > 
> > > Hotplug generally doesn't work during migration, so it should be disabled
> > > in a generic way on migration start and re-enabled on target
> > > on migration completion. How about blocking device_add when
> > > INMIGRATE state and unblocking it when switching to runnig on dst?  
> > 
> > Maybe I'm misunderstanding the intent of this patch, but in our own
> > testing we've seen that even for CPUs hotplugged *before* migration
> > starts, libvirt will add them to the dest via device_add instead of
> > via the command-line.
> the way migration currently works, this behavior seems fine to me,
> whether hotplugged CPUs on target side are specified with -device or
> device_add, it shouldn't affect machine behavior. It doesn't in
> case of cpu for x86, where hotplug process state is
> migrated as VMSTATE_CPU_HOTPLUG subsection of piix4_pm/ich9
> device.
> 
> 
> > If the CPUs were all specified via command-line, I don't think these
> > patches would be needed, since the coldplug hooks would be executed
> > without the need to make any special considerations for INMIGRATE.
> I don't think that it's libvirt's problem, user if free to use either
> -device or device_add to add devices on target side.
> 
> We might re-enable writing to hotplugged property (36cccb8c5)
> and ask mgmt to set its value on target but that would not work
> fine for old mgmt tools.
> Perhaps we should migrate DeviceState::hotplugged property state
> as part of every device so that target could fixup device state
> according to its value, but I'm not sure if it is useful.
> 
> 
> > This libvirt commit seems to confirm that the CPUs are added via
> > device_add, and we've seen similar behavior in our testing:
> > 
> > 
> > commit 9eb9106ea51b43102ee51132f69780b2c86ccbca
> > Author: Peter Krempa <pkre...@redhat.com>
> > Date:   Thu Aug 4 14:36:24 2016 +0200
> > 
> >     qemu: command: Add support for sparse vcpu topologies
> >     
> >     Add support for using the new approach to hotplug vcpus using device_add
> >     during startup of qemu to allow sparse vcpu topologies.
> >     
> >     There are a few limitations imposed by qemu on the supported
> >     configuration:
> >     - vcpu0 needs to be always present and not hotpluggable
> >     - non-hotpluggable cpus need to be ordered at the beginning
> >     - order of the vcpus needs to be unique for every single hotpluggable
> >       entity
> >     
> >     Qemu also doesn't really allow to query the information necessary to
> >     start a VM with the vcpus directly on the commandline. Fortunately they
> >     can be hotplugged during startup.
> >     
> >     The new hotplug code uses the following approach:
> >     - non-hotpluggable vcpus are counted and put to the -smp option
> >     - qemu is started
> >     - qemu is queried for the necessary information
> >     - the configuration is checked
> >     - the hotpluggable vcpus are hotplugged
> >     - vcpus are started
> >     
> >     This patch adds a lot of checking code and enables the support to
> >     specify the individual vcpu element with qemu.
> > 
> > 
> > So I don't think disabling migration during inmigrate is a possible
> > alternative unless we rework how libvirt handles this. The only
> > alternative to this patch that I'm aware of would be to always
> > migrate DRCs when dev->hotplugged == true.
> Currently I'd suggest to look into always migrate DRCs if cpu hotplug
> is enabled even if dev->hotplugged is false (not nice but it might work).
> Consider:
>   SRC1: hotplug CPU1 => CPU1.hotplugged = true
>   DST1: -device CPU1 => CPU1.hotplugged = false
> so in current code relying on CPU1.hotplugged would not work as expected,
> it works by accident because libvirt uses device_add on target
>   DST1: device_add CPU1 => CPU1.hotplugged = true
> 
> If we try to fix it by migrating 'DeviceState::hotplugged' flag,
> we would need CPU/memory/machine specific migration hooks which will
> fix device/machine state as by the time migration stream is processed
> on target side, all devices are already wired up using -device or
> device_add paths (cold/hotplugged paths).
> Approach doesn't seem robust to me.
> 
> May be we should
>  1. make DeviceState:hotpluggable property write-able again
>  2. transmit DeviceState:hotpluggable as part of migration stream
>  3. add generic migration hook which will check if target and
>     source value match, if value differs => fail/abort migration.
>  4. in case values mismatch mgmt will be forced to explicitly
>     provide hotplugged property value on -device/device_add
> That would enforce consistent DeviceState:hotpluggable value
> on target and source.
> We can enforce it only for new machine types so it won't break
> old mgmt tools with old machine types but would force mgmt
> for new machines to use hotplugged property on target
> so QEMU could rely on its value for migration purposes.
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH] spapr: manage hotplugged devices while the VM is not started

Reply via email to