On Thu, 3 Apr 2025 13:33:17 -0700 Farhan Ali <al...@linux.ibm.com> wrote:
> On 4/3/2025 11:05 AM, Alex Williamson wrote: > > On Thu, 3 Apr 2025 10:33:52 -0700 > > Farhan Ali <al...@linux.ibm.com> wrote: > > > >> On 4/3/2025 9:27 AM, Alex Williamson wrote: > >>> On Thu, 3 Apr 2025 11:44:42 -0400 > >>> Stefan Hajnoczi <stefa...@redhat.com> wrote: > >>> > >>>> On Thu, Apr 03, 2025 at 09:47:26AM +0200, Niklas Schnelle wrote: > >>>>> On Wed, 2025-04-02 at 11:51 -0400, Stefan Hajnoczi wrote: > >>>>>> On Tue, Apr 01, 2025 at 10:22:43AM -0700, Farhan Ali wrote: > >>>>>>> Hi, > >>>>>>> > >>>>>>> Recently on s390x we have enabled mmap support for vfio-pci devices > >>>>>>> [1]. > >>>>>> Hi Alex, > >>>>>> I wanted to bring this to your attention. Feel free to merge it through > >>>>>> the VFIO tree, otherwise I will merge it once you have taken a look. > >>>>>> > >>>>>> Thanks, > >>>>>> Stefan > >>>>>> > >>>>>>> This allows us to take advantage and use userspace drivers on s390x. > >>>>>>> However, > >>>>>>> on s390x we have special instructions for MMIO access. Starting with > >>>>>>> z15 > >>>>>>> (and newer platforms) we have new PCI Memory I/O (MIO) instructions > >>>>>>> which > >>>>>>> operate on virtually mapped PCI memory spaces, and can be used from > >>>>>>> userspace. > >>>>>>> On older platforms we would fallback to using existing system calls > >>>>>>> for MMIO access. > >>>>>>> > >>>>>>> This patch series introduces support the PCI MIO instructions, and > >>>>>>> enables s390x > >>>>>>> support for the userspace NVMe driver on s390x. I would appreciate > >>>>>>> any review/feedback > >>>>>>> on the patches. > >>>>>>> > >>>>>>> Thanks > >>>>>>> Farhan > >>>>> Hi Stefan, > >>>>> > >>>>> the kernel patch actually made it into Linus' tree for v6.15 already as > >>>>> commit aa9f168d55dc ("s390/pci: Support mmap() of PCI resources except > >>>>> for ISM devices") plus prerequisites. This went via the PCI tree > >>>>> because they included a change to struct pci_dev and also enabled > >>>>> mmap() on PCI resource files. Alex reviewed an earlier version and was > >>>>> the one who suggested to also enable mmap() on PCI resources. > >>>> The introduction of a new QEMU API for accessing MMIO BARs in this > >>>> series is something Alex might be interested in as QEMU VFIO maintainer. > >>>> That wouldn't have been part of the kernel patch review. > >>>> > >>>> If he's aware of the new API he can encourage other VFIO users to use it > >>>> in the future so that you won't need to convert them to work on s390x > >>>> again. > >>> I don't claim any jurisdiction over the vfio-nvme driver. In general > >>> vfio users should be using either vfio_region_ops, ram_device_mem_ops, > >>> or directly mapping MMIO into the VM address space. The first uses > >>> pread/write through the region offset, irrespective of the type of > >>> memory, the second provides the type of access used here where we're > >>> dereferencing into an mmap, and the last if of course the preferred > >>> mechanism where available. > >>> > >>> It is curious that the proposal here doesn't include any changes to > >>> ram_device_mem_ops for more generically enabling MMIO access on s390x. > >>> Thanks, > >>> > >>> Alex > >> > >> Hi Alex, > >> From my understanding the ram_device_mem_ops sets up the BAR access for > >> a guest passthrough device. Unfortunately today an s390x KVM guest > >> doesn't use and have support for these MIO instructions. We wanted to > >> use this series as an initial test vehicle of the mmap support. > > Right, ram_device_mem_ops is what we'll use to access a BAR that > > supports mmap but for whatever reason we're accessing it directly > > through the mmap. For instance if an overlapping quirk prevents the > > page from being mapped to the VM or we have some back channel mechanism > > where the VMM is interacting with the BAR. > > > > I bring it up here because it's effectively the same kind of access > > you're adding with these helpers and would need to be addressed if this > > were generically enabling vfio mmap access on s390x. > > On s390x the use of the MIO instructions is limited to only PCI access. > So i am not sure if we should generically apply this to all vfio mmap > access (for non PCI devices). > > > > > > Prior to commit 2b8fe81b3c2e ("system/memory: use ldn_he_p/stn_he_p") > > the mmio helpers here might have been a drop-in replacement for the > > dereferencing of mmap offsets, but something would need to be done > > about the explicit PCI assumption introduced here and the possibility > > of unaligned accesses that the noted commit tries to resolve. Thanks, > > > > Alex > > AFAICT in qemu today the ram_device_mem_ops is used for non PCI vfio > mmap cases. For s390x these helpers should be restricted to PCI > accesses. For the unaligned accesses (thanks for pointing out that > commmit!), are you suggesting we use the ld*_he_p/st*_he_p functions in > the helpers i defined? Though those functions don't seem to be doing > volatile accesses. TBH, it's not clear to me that 2b8fe81b3c2e is correct. We implemented the ram_device MemoryRegion specifically to avoid memory access optimizations that are not compatible with MMIO, but I see that these {ld,st}*_he_pe operations are using __builtin_memcpy. I'm not a compiler aficionado, but is __builtin_memcpy guaranteed to use an instruction set compatible with MMIO? Cc: folks related to that commit. The original issue that brought us ram_device was a very obscure alignment of a memory region versus a device quirk only seen with assignment of specific RTL NICs. The description for commit 4a2e242bbb30 ("memory: Don't use memcpy for ram_device regions") also addresses unaligned accesses, we don't expect drivers to use them and we don't want them to work differently in a VM than they might on bare metal. We can debate whether that's valid or not, but that was the intent. Have we re-introduced the chance that we're using optimized instructions only meant to target RAM here or is __builtin_memcpy implicitly safe for MMIO? Thanks, Alex