On Thu, 3 Apr 2025 13:33:17 -0700
Farhan Ali <al...@linux.ibm.com> wrote:

> On 4/3/2025 11:05 AM, Alex Williamson wrote:
> > On Thu, 3 Apr 2025 10:33:52 -0700
> > Farhan Ali <al...@linux.ibm.com> wrote:
> >  
> >> On 4/3/2025 9:27 AM, Alex Williamson wrote:  
> >>> On Thu, 3 Apr 2025 11:44:42 -0400
> >>> Stefan Hajnoczi <stefa...@redhat.com> wrote:
> >>>     
> >>>> On Thu, Apr 03, 2025 at 09:47:26AM +0200, Niklas Schnelle wrote:  
> >>>>> On Wed, 2025-04-02 at 11:51 -0400, Stefan Hajnoczi wrote:  
> >>>>>> On Tue, Apr 01, 2025 at 10:22:43AM -0700, Farhan Ali wrote:  
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> Recently on s390x we have enabled mmap support for vfio-pci devices 
> >>>>>>> [1].  
> >>>>>> Hi Alex,
> >>>>>> I wanted to bring this to your attention. Feel free to merge it through
> >>>>>> the VFIO tree, otherwise I will merge it once you have taken a look.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Stefan
> >>>>>>         
> >>>>>>> This allows us to take advantage and use userspace drivers on s390x. 
> >>>>>>> However,
> >>>>>>> on s390x we have special instructions for MMIO access. Starting with 
> >>>>>>> z15
> >>>>>>> (and newer platforms) we have new PCI Memory I/O (MIO) instructions 
> >>>>>>> which
> >>>>>>> operate on virtually mapped PCI memory spaces, and can be used from 
> >>>>>>> userspace.
> >>>>>>> On older platforms we would fallback to using existing system calls 
> >>>>>>> for MMIO access.
> >>>>>>>
> >>>>>>> This patch series introduces support the PCI MIO instructions, and 
> >>>>>>> enables s390x
> >>>>>>> support for the userspace NVMe driver on s390x. I would appreciate 
> >>>>>>> any review/feedback
> >>>>>>> on the patches.
> >>>>>>>
> >>>>>>> Thanks
> >>>>>>> Farhan  
> >>>>> Hi Stefan,
> >>>>>
> >>>>> the kernel patch actually made it into Linus' tree for v6.15 already as
> >>>>> commit aa9f168d55dc ("s390/pci: Support mmap() of PCI resources except
> >>>>> for ISM devices") plus prerequisites. This went via the PCI tree
> >>>>> because they included a change to struct pci_dev and also enabled
> >>>>> mmap() on PCI resource files. Alex reviewed an earlier version and was
> >>>>> the one who suggested to also enable mmap() on PCI resources.  
> >>>> The introduction of a new QEMU API for accessing MMIO BARs in this
> >>>> series is something Alex might be interested in as QEMU VFIO maintainer.
> >>>> That wouldn't have been part of the kernel patch review.
> >>>>
> >>>> If he's aware of the new API he can encourage other VFIO users to use it
> >>>> in the future so that you won't need to convert them to work on s390x
> >>>> again.  
> >>> I don't claim any jurisdiction over the vfio-nvme driver.  In general
> >>> vfio users should be using either vfio_region_ops, ram_device_mem_ops,
> >>> or directly mapping MMIO into the VM address space.  The first uses
> >>> pread/write through the region offset, irrespective of the type of
> >>> memory, the second provides the type of access used here where we're
> >>> dereferencing into an mmap, and the last if of course the preferred
> >>> mechanism where available.
> >>>
> >>> It is curious that the proposal here doesn't include any changes to
> >>> ram_device_mem_ops for more generically enabling MMIO access on s390x.
> >>> Thanks,
> >>>
> >>> Alex  
> >>
> >> Hi Alex,
> >>   From my understanding the ram_device_mem_ops sets up the BAR access for
> >> a guest passthrough device. Unfortunately today an s390x KVM guest
> >> doesn't use and have support for these MIO instructions. We wanted to
> >> use this series as an initial test vehicle of the mmap support.  
> > Right, ram_device_mem_ops is what we'll use to access a BAR that
> > supports mmap but for whatever reason we're accessing it directly
> > through the mmap.  For instance if an overlapping quirk prevents the
> > page from being mapped to the VM or we have some back channel mechanism
> > where the VMM is interacting with the BAR.
> >
> > I bring it up here because it's effectively the same kind of access
> > you're adding with these helpers and would need to be addressed if this
> > were generically enabling vfio mmap access on s390x.  
> 
> On s390x the use of the MIO instructions is limited to only PCI access. 
> So i am not sure if we should generically apply this to all vfio mmap 
> access (for non PCI devices).
> 
> 
> >
> > Prior to commit 2b8fe81b3c2e ("system/memory: use ldn_he_p/stn_he_p")
> > the mmio helpers here might have been a drop-in replacement for the
> > dereferencing of mmap offsets, but something would need to be done
> > about the explicit PCI assumption introduced here and the possibility
> > of unaligned accesses that the noted commit tries to resolve.  Thanks,
> >
> > Alex  
> 
> AFAICT in qemu today the ram_device_mem_ops is used for non PCI vfio 
> mmap cases. For s390x these helpers should be restricted to PCI 
> accesses. For the unaligned accesses (thanks for pointing out that 
> commmit!), are you suggesting we use the ld*_he_p/st*_he_p functions in 
> the helpers i defined? Though those functions don't seem to be doing 
> volatile accesses.

TBH, it's not clear to me that 2b8fe81b3c2e is correct.  We implemented
the ram_device MemoryRegion specifically to avoid memory access
optimizations that are not compatible with MMIO, but I see that these
{ld,st}*_he_pe operations are using __builtin_memcpy.  I'm not a
compiler aficionado, but is __builtin_memcpy guaranteed to use an
instruction set compatible with MMIO?

Cc: folks related to that commit.

The original issue that brought us ram_device was a very obscure
alignment of a memory region versus a device quirk only seen with
assignment of specific RTL NICs.

The description for commit 4a2e242bbb30 ("memory: Don't use memcpy for
ram_device regions") also addresses unaligned accesses, we don't expect
drivers to use them and we don't want them to work differently in a VM
than they might on bare metal.  We can debate whether that's valid or
not, but that was the intent.

Have we re-introduced the chance that we're using optimized
instructions only meant to target RAM here or is __builtin_memcpy
implicitly safe for MMIO?  Thanks,

Alex


Reply via email to