On Thu, 10 Apr 2025 09:07:51 -0700
Farhan Ali <al...@linux.ibm.com> wrote:

> On 4/3/2025 2:24 PM, Alex Williamson wrote:
> > On Thu, 3 Apr 2025 13:33:17 -0700
> > Farhan Ali <al...@linux.ibm.com> wrote:
> >  
> >> On 4/3/2025 11:05 AM, Alex Williamson wrote:  
> >>> On Thu, 3 Apr 2025 10:33:52 -0700
> >>> Farhan Ali <al...@linux.ibm.com> wrote:
> >>>     
> >>>> On 4/3/2025 9:27 AM, Alex Williamson wrote:  
> >>>>> On Thu, 3 Apr 2025 11:44:42 -0400
> >>>>> Stefan Hajnoczi <stefa...@redhat.com> wrote:
> >>>>>        
> >>>>>> On Thu, Apr 03, 2025 at 09:47:26AM +0200, Niklas Schnelle wrote:  
> >>>>>>> On Wed, 2025-04-02 at 11:51 -0400, Stefan Hajnoczi wrote:  
> >>>>>>>> On Tue, Apr 01, 2025 at 10:22:43AM -0700, Farhan Ali wrote:  
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> Recently on s390x we have enabled mmap support for vfio-pci devices 
> >>>>>>>>> [1].  
> >>>>>>>> Hi Alex,
> >>>>>>>> I wanted to bring this to your attention. Feel free to merge it 
> >>>>>>>> through
> >>>>>>>> the VFIO tree, otherwise I will merge it once you have taken a look.
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Stefan
> >>>>>>>>            
> >>>>>>>>> This allows us to take advantage and use userspace drivers on 
> >>>>>>>>> s390x. However,
> >>>>>>>>> on s390x we have special instructions for MMIO access. Starting 
> >>>>>>>>> with z15
> >>>>>>>>> (and newer platforms) we have new PCI Memory I/O (MIO) instructions 
> >>>>>>>>> which
> >>>>>>>>> operate on virtually mapped PCI memory spaces, and can be used from 
> >>>>>>>>> userspace.
> >>>>>>>>> On older platforms we would fallback to using existing system calls 
> >>>>>>>>> for MMIO access.
> >>>>>>>>>
> >>>>>>>>> This patch series introduces support the PCI MIO instructions, and 
> >>>>>>>>> enables s390x
> >>>>>>>>> support for the userspace NVMe driver on s390x. I would appreciate 
> >>>>>>>>> any review/feedback
> >>>>>>>>> on the patches.
> >>>>>>>>>
> >>>>>>>>> Thanks
> >>>>>>>>> Farhan  
> >>>>>>> Hi Stefan,
> >>>>>>>
> >>>>>>> the kernel patch actually made it into Linus' tree for v6.15 already 
> >>>>>>> as
> >>>>>>> commit aa9f168d55dc ("s390/pci: Support mmap() of PCI resources except
> >>>>>>> for ISM devices") plus prerequisites. This went via the PCI tree
> >>>>>>> because they included a change to struct pci_dev and also enabled
> >>>>>>> mmap() on PCI resource files. Alex reviewed an earlier version and was
> >>>>>>> the one who suggested to also enable mmap() on PCI resources.  
> >>>>>> The introduction of a new QEMU API for accessing MMIO BARs in this
> >>>>>> series is something Alex might be interested in as QEMU VFIO 
> >>>>>> maintainer.
> >>>>>> That wouldn't have been part of the kernel patch review.
> >>>>>>
> >>>>>> If he's aware of the new API he can encourage other VFIO users to use 
> >>>>>> it
> >>>>>> in the future so that you won't need to convert them to work on s390x
> >>>>>> again.  
> >>>>> I don't claim any jurisdiction over the vfio-nvme driver.  In general
> >>>>> vfio users should be using either vfio_region_ops, ram_device_mem_ops,
> >>>>> or directly mapping MMIO into the VM address space.  The first uses
> >>>>> pread/write through the region offset, irrespective of the type of
> >>>>> memory, the second provides the type of access used here where we're
> >>>>> dereferencing into an mmap, and the last if of course the preferred
> >>>>> mechanism where available.
> >>>>>
> >>>>> It is curious that the proposal here doesn't include any changes to
> >>>>> ram_device_mem_ops for more generically enabling MMIO access on s390x.
> >>>>> Thanks,
> >>>>>
> >>>>> Alex  
> >>>> Hi Alex,
> >>>>    From my understanding the ram_device_mem_ops sets up the BAR access 
> >>>> for
> >>>> a guest passthrough device. Unfortunately today an s390x KVM guest
> >>>> doesn't use and have support for these MIO instructions. We wanted to
> >>>> use this series as an initial test vehicle of the mmap support.  
> >>> Right, ram_device_mem_ops is what we'll use to access a BAR that
> >>> supports mmap but for whatever reason we're accessing it directly
> >>> through the mmap.  For instance if an overlapping quirk prevents the
> >>> page from being mapped to the VM or we have some back channel mechanism
> >>> where the VMM is interacting with the BAR.
> >>>
> >>> I bring it up here because it's effectively the same kind of access
> >>> you're adding with these helpers and would need to be addressed if this
> >>> were generically enabling vfio mmap access on s390x.  
> >> On s390x the use of the MIO instructions is limited to only PCI access.
> >> So i am not sure if we should generically apply this to all vfio mmap
> >> access (for non PCI devices).
> >>
> >>  
> >>> Prior to commit 2b8fe81b3c2e ("system/memory: use ldn_he_p/stn_he_p")
> >>> the mmio helpers here might have been a drop-in replacement for the
> >>> dereferencing of mmap offsets, but something would need to be done
> >>> about the explicit PCI assumption introduced here and the possibility
> >>> of unaligned accesses that the noted commit tries to resolve.  Thanks,
> >>>
> >>> Alex  
> >> AFAICT in qemu today the ram_device_mem_ops is used for non PCI vfio
> >> mmap cases. For s390x these helpers should be restricted to PCI
> >> accesses. For the unaligned accesses (thanks for pointing out that
> >> commmit!), are you suggesting we use the ld*_he_p/st*_he_p functions in
> >> the helpers i defined? Though those functions don't seem to be doing
> >> volatile accesses.  
> > TBH, it's not clear to me that 2b8fe81b3c2e is correct.  We implemented
> > the ram_device MemoryRegion specifically to avoid memory access
> > optimizations that are not compatible with MMIO, but I see that these
> > {ld,st}*_he_pe operations are using __builtin_memcpy.  I'm not a
> > compiler aficionado, but is __builtin_memcpy guaranteed to use an
> > instruction set compatible with MMIO?
> >
> > Cc: folks related to that commit.
> >
> > The original issue that brought us ram_device was a very obscure
> > alignment of a memory region versus a device quirk only seen with
> > assignment of specific RTL NICs.
> >
> > The description for commit 4a2e242bbb30 ("memory: Don't use memcpy for
> > ram_device regions") also addresses unaligned accesses, we don't expect
> > drivers to use them and we don't want them to work differently in a VM
> > than they might on bare metal.  We can debate whether that's valid or
> > not, but that was the intent.
> >
> > Have we re-introduced the chance that we're using optimized
> > instructions only meant to target RAM here or is __builtin_memcpy
> > implicitly safe for MMIO?  Thanks,
> >
> > Alex  
> 
> 
> Hi Stefan, Alex
> 
> 
> Polite ping. Following up to understand how we should proceed with this 
> series. Please let me know if there are any concerns that i haven't 
> addressed?

I disassembled the current implementation using ldn_he_p/stn_he_p on
x86_64 and it doesn't appear to introduce any of the mmx/sse optimized
optimized code that we were trying to get away from in introducing the
ram_device MemoryRegion and getting away from memcpy.  I wish I had
some assurance that __builtin_memcpy won't invoke such operations, but
it seems unlikely that it would for the discrete, fundamental size
operations we're asking of it.  Therefore, maybe it is advisable to use
the ld*_he_p/st*_he_p helpers rather than open code the memory derefs.

It's unfortunate that s390x needs to specifically restrict this access
to PCI memory, but maybe that means that PCI specific version of these
helpers are only created for s390x and elsewhere #define'd to the
generic ld/st helpers, which maybe means the main interface should be a
host_pci_{ld,st}n_he_p (maybe "le" given the implementation) type
function.  I don't know if we'd then create a ram_pci_device variant
memory region ops for use in vfio-pci, but it should probably be coded
with that in mind.  Thanks,

Alex


Reply via email to