* Alex Williamson (alex.william...@redhat.com) wrote:
> On Thu, 27 Jan 2022 08:30:13 +0000
> Stefan Hajnoczi <stefa...@redhat.com> wrote:
> 
> > On Wed, Jan 26, 2022 at 04:13:33PM -0500, Michael S. Tsirkin wrote:
> > > On Wed, Jan 26, 2022 at 08:07:36PM +0000, Dr. David Alan Gilbert wrote:  
> > > > * Stefan Hajnoczi (stefa...@redhat.com) wrote:  
> > > > > On Wed, Jan 26, 2022 at 05:27:32AM +0000, Jag Raman wrote:  
> > > > > > 
> > > > > >   
> > > > > > > On Jan 25, 2022, at 1:38 PM, Dr. David Alan Gilbert 
> > > > > > > <dgilb...@redhat.com> wrote:
> > > > > > > 
> > > > > > > * Jag Raman (jag.ra...@oracle.com) wrote:  
> > > > > > >> 
> > > > > > >>   
> > > > > > >>> On Jan 19, 2022, at 7:12 PM, Michael S. Tsirkin 
> > > > > > >>> <m...@redhat.com> wrote:
> > > > > > >>> 
> > > > > > >>> On Wed, Jan 19, 2022 at 04:41:52PM -0500, Jagannathan Raman 
> > > > > > >>> wrote:  
> > > > > > >>>> Allow PCI buses to be part of isolated CPU address spaces. 
> > > > > > >>>> This has a
> > > > > > >>>> niche usage.
> > > > > > >>>> 
> > > > > > >>>> TYPE_REMOTE_MACHINE allows multiple VMs to house their PCI 
> > > > > > >>>> devices in
> > > > > > >>>> the same machine/server. This would cause address space 
> > > > > > >>>> collision as
> > > > > > >>>> well as be a security vulnerability. Having separate address 
> > > > > > >>>> spaces for
> > > > > > >>>> each PCI bus would solve this problem.  
> > > > > > >>> 
> > > > > > >>> Fascinating, but I am not sure I understand. any examples?  
> > > > > > >> 
> > > > > > >> Hi Michael!
> > > > > > >> 
> > > > > > >> multiprocess QEMU and vfio-user implement a client-server model 
> > > > > > >> to allow
> > > > > > >> out-of-process emulation of devices. The client QEMU, which 
> > > > > > >> makes ioctls
> > > > > > >> to the kernel and runs VCPUs, could attach devices running in a 
> > > > > > >> server
> > > > > > >> QEMU. The server QEMU needs access to parts of the client’s RAM 
> > > > > > >> to
> > > > > > >> perform DMA.  
> > > > > > > 
> > > > > > > Do you ever have the opposite problem? i.e. when an emulated PCI 
> > > > > > > device  
> > > > > > 
> > > > > > That’s an interesting question.
> > > > > >   
> > > > > > > exposes a chunk of RAM-like space (frame buffer, or maybe a 
> > > > > > > mapped file)
> > > > > > > that the client can see.  What happens if two emulated devices 
> > > > > > > need to
> > > > > > > access each others emulated address space?  
> > > > > > 
> > > > > > In this case, the kernel driver would map the destination’s chunk 
> > > > > > of internal RAM into
> > > > > > the DMA space of the source device. Then the source device could 
> > > > > > write to that
> > > > > > mapped address range, and the IOMMU should direct those writes to 
> > > > > > the
> > > > > > destination device.
> > > > > > 
> > > > > > I would like to take a closer look at the IOMMU implementation on 
> > > > > > how to achieve
> > > > > > this, and get back to you. I think the IOMMU would handle this. 
> > > > > > Could you please
> > > > > > point me to the IOMMU implementation you have in mind?  
> > > > > 
> > > > > I don't know if the current vfio-user client/server patches already
> > > > > implement device-to-device DMA, but the functionality is supported by
> > > > > the vfio-user protocol.
> > > > > 
> > > > > Basically: if the DMA regions lookup inside the vfio-user server 
> > > > > fails,
> > > > > fall back to VFIO_USER_DMA_READ/WRITE messages instead.
> > > > > https://github.com/nutanix/libvfio-user/blob/master/docs/vfio-user.rst#vfio-user-dma-read
> > > > > 
> > > > > Here is the flow:
> > > > > 1. The vfio-user server with device A sends a DMA read to QEMU.
> > > > > 2. QEMU finds the MemoryRegion associated with the DMA address and 
> > > > > sees
> > > > >    it's a device.
> > > > >    a. If it's emulated inside the QEMU process then the normal
> > > > >       device emulation code kicks in.
> > > > >    b. If it's another vfio-user PCI device then the vfio-user PCI 
> > > > > proxy
> > > > >       device forwards the DMA to the second vfio-user server's device 
> > > > > B.  
> > > > 
> > > > I'm starting to be curious if there's a way to persuade the guest kernel
> > > > to do it for us; in general is there a way to say to PCI devices that
> > > > they can only DMA to the host and not other PCI devices?  
> > > 
> > > 
> > > But of course - this is how e.g. VFIO protects host PCI devices from
> > > each other when one of them is passed through to a VM.  
> > 
> > Michael: Are you saying just turn on vIOMMU? :)
> > 
> > Devices in different VFIO groups have their own IOMMU context, so their
> > IOVA space is isolated. Just don't map other devices into the IOVA space
> > and those other devices will be inaccessible.
> 
> Devices in different VFIO *containers* have their own IOMMU context.
> Based on the group attachment to a container, groups can either have
> shared or isolated IOVA space.  That determination is made by looking
> at the address space of the bus, which is governed by the presence of a
> vIOMMU.
> 
> If the goal here is to restrict DMA between devices, ie. peer-to-peer
> (p2p), why are we trying to re-invent what an IOMMU already does?

That was what I was curious about - is it possible to get an IOMMU to do
that, and how? (Not knowing much about IOMMUs).
In my DAX/virtiofs case, I want the device to be able to DMA to guest
RAM but for other devices not to try to DMA to it and in particular for
it not to have to DMA to other devices.

>  In
> fact, it seems like an IOMMU does this better in providing an IOVA
> address space per BDF.  Is the dynamic mapping overhead too much?  What
> physical hardware properties or specifications could we leverage to
> restrict p2p mappings to a device?  Should it be governed by machine
> type to provide consistency between devices?  Should each "isolated"
> bus be in a separate root complex?  Thanks,

Dave

> Alex
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK


Reply via email to