migration: Add VFIO migration pre-copy support

Alex Williamson Tue, 31 Jan 2023 20:15:27 -0800

On Tue, 31 Jan 2023 19:29:48 -0400
Jason Gunthorpe <j...@nvidia.com> wrote:


> On Tue, Jan 31, 2023 at 03:43:01PM -0700, Alex Williamson wrote:
> 
> > How does this affect our path towards supported migration?  I'm
> > thinking about a user experience where QEMU supports migration if
> > device A OR device B are attached, but not devices A and B attached to
> > the same VM.  We might have a device C where QEMU supports migration
> > with B AND C, but not A AND C, nor A AND B AND C.  This would be the
> > case if device B and device C both supported P2P states, but device A
> > did not. The user has no observability of this feature, so all of this
> > looks effectively random to the user.  
> 
> I think qemu should just log if it encounters a device without P2P
> support.

Better for debugging, but still poor from a VM management perspective.

> > Even in the single device case, we need to make an assumption that a
> > device that does not support P2P migration states (or when QEMU doesn't
> > make use of P2P states) cannot be a DMA target, or otherwise have its
> > MMIO space accessed while in a STOP state.  Can we guarantee that when
> > other devices have not yet transitioned to STOP?  
> 
> You mean the software devices created by qemu?

Other devices, software or otherwise, yes.

> > We could disable the direct map MemoryRegions when we move to a STOP
> > state, which would give QEMU visibility to those accesses, but besides
> > pulling an abort should such an access occur, could we queue them in
> > software, add them to the migration stream, and replay them after the
> > device moves to the RUNNING state?  We'd need to account for the lack of
> > RESUMING_P2P states as well, trapping and queue accesses from devices
> > already RUNNING to those still in RESUMING (not _P2P).  
> 
> I think any internal SW devices should just fail all accesses to the
> P2P space, all the time.
> 
> qemu simply acts like a real system that doesn't support P2P.
> 
> IMHO this is generally the way forward to do multi-device as well,
> remove the MMIO from all the address maps: VFIO, SW access, all of
> them. Nothing can touch MMIO except for the vCPU.

Are you suggesting this relative to migration or in general?  P2P is a
feature with tangible benefits and real use cases.  Real systems seem
to be moving towards making P2P work better, so it would seem short
sighted to move to and enforce only configurations w/o P2P in QEMU
generally.  Besides, this would require some sort of deprecation, so are
we intending to make users choose between migration and P2P?
 
> > This all looks complicated.  Is it better to start with requiring P2P
> > state support?  Thanks,  
> 
> People have built HW without it, so I don't see this as so good..

Are we obliged to start with that hardware?  I'm just trying to think
about whether a single device restriction is sufficient to prevent any
possible P2P or whether there might be an easier starting point for
more capable hardware.  There's no shortage of hardware that could
support migration given sufficient effort.  Thanks,

Alex

Re: [PATCH 01/18] vfio/migration: Add VFIO migration pre-copy support

Reply via email to