Re: [Qemu-devel] vhost, iova, and dirty page tracking

Tian, Kevin Tue, 17 Sep 2019 18:32:27 -0700

> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Tuesday, September 17, 2019 10:54 PM
> 
> On Tue, 17 Sep 2019 08:48:36 +0000
> "Tian, Kevin" <kevin.t...@intel.com> wrote:
> 
> > > From: Jason Wang [mailto:jasow...@redhat.com]
> > > Sent: Monday, September 16, 2019 4:33 PM
> > >
> > >
> > > On 2019/9/16 上午9:51, Tian, Kevin wrote:
> > > > Hi, Jason
> > > >
> > > > We had a discussion about dirty page tracking in VFIO, when vIOMMU
> > > > is enabled:
> > > >
> > > > https://lists.nongnu.org/archive/html/qemu-devel/2019-
> > > 09/msg02690.html
> > > >
> > > > It's actually a similar model as vhost - Qemu cannot interpose the fast-
> > > path
> > > > DMAs thus relies on the kernel part to track and report dirty page
> > > information.
> > > > Currently Qemu tracks dirty pages in GFN level, thus demanding a
> > > translation
> > > > from IOVA to GPA. Then the open in our discussion is where this
> > > translation
> > > > should happen. Doing the translation in kernel implies a device iotlb
> > > flavor,
> > > > which is what vhost implements today. It requires potentially large
> > > tracking
> > > > structures in the host kernel, but leveraging the existing log_sync flow
> in
> > > Qemu.
> > > > On the other hand, Qemu may perform log_sync for every removal of
> > > IOVA
> > > > mapping and then do the translation itself, then avoiding the GPA
> > > awareness
> > > > in the kernel side. It needs some change to current Qemu log-sync
> flow,
> > > and
> > > > may bring more overhead if IOVA is frequently unmapped.
> > > >
> > > > So we'd like to hear about your opinions, especially about how you
> came
> > > > down to the current iotlb approach for vhost.
> > >
> > >
> > > We don't consider too much in the point when introducing vhost. And
> > > before IOTLB, vhost has already know GPA through its mem table
> > > (GPA->HVA). So it's nature and easier to track dirty pages at GPA level
> > > then it won't any changes in the existing ABI.
> >
> > This is the same situation as VFIO.
> 
> It is?  VFIO doesn't know GPAs, it only knows HVA, HPA, and IOVA.  In
> some cases IOVA is GPA, but not all.


Well, I thought vhost has a similar design, that the index of its mem table
is GPA when vIOMMU is off and then becomes IOVA when vIOMMU is on.
But I may be wrong here. Jason, can you help clarify? I saw two 
interfaces which poke the mem table: VHOST_SET_MEM_TABLE (for GPA)
and VHOST_IOTLB_UPDATE (for IOVA). Are they used exclusively or together?

> 
> > > For VFIO case, the only advantages of using GPA is that the log can then
> > > be shared among all the devices that belongs to the VM. Otherwise
> > > syncing through IOVA is cleaner.
> >
> > I still worry about the potential performance impact with this approach.
> > In current mdev live migration series, there are multiple system calls
> > involved when retrieving the dirty bitmap information for a given memory
> > range. IOVA mappings might be changed frequently. Though one may
> > argue that frequent IOVA change already has bad performance, it's still
> > not good to introduce further non-negligible overhead in such situation.
> >
> > On the other hand, I realized that adding IOVA awareness in VFIO is
> > actually easy. Today VFIO already maintains a full list of IOVA and its
> > associated HVA in vfio_dma structure, according to VFIO_MAP and
> > VFIO_UNMAP. As long as we allow the latter two operations to accept
> > another parameter (GPA), IOVA->GPA mapping can be naturally cached
> > in existing vfio_dma objects. Those objects are always updated according
> > to MAP and UNMAP ioctls to be up-to-date. Qemu then uniformly
> > retrieves the VFIO dirty bitmap for the entire GPA range in every pre-copy
> > round, regardless of whether vIOMMU is enabled. There is no need of
> > another IOTLB implementation, with the main ask on a v2 MAP/UNMAP
> > interface.
> >
> > Alex, your thoughts?
> 
> Same as last time, you're asking VFIO to be aware of an entirely new
> address space and implement tracking structures of that address space
> to make life easier for QEMU.  Don't we typically push such complexity
> to userspace rather than into the kernel?  I'm not convinced.  Thanks,
> 

Is it really complex? No need of a new tracking structure. Just allowing
the MAP interface to carry a new parameter and then record it in the
existing vfio_dma objects.

Note the frequency of guest DMA map/unmap could be very high. We
saw >100K invocations per second with a 40G NIC. To do the right
translation Qemu requires log_sync for every unmap, before the
mapping for logged dirty IOVA becomes stale. In current Kirti's patch,
each log_sync requires several system_calls through the migration
info, e.g. setting start_pfn/page_size/total_pfns and then reading
data_offset/data_size. That design is fine for doing log_sync in every
pre-copy round, but too costly if doing so for every IOVA unmap. If
small extension in kernel can lead to great overhead reduction,
why not?

Thanks
Kevin

Re: [Qemu-devel] vhost, iova, and dirty page tracking

Reply via email to