On Mon, Feb 24, 2025 at 2:57 PM David Hildenbrand <da...@redhat.com> wrote: > > On 24.02.25 14:41, Albert Esteve wrote: > > On Mon, Feb 24, 2025 at 10:49 AM David Hildenbrand <da...@redhat.com> wrote: > >> > >> On 24.02.25 10:35, Albert Esteve wrote: > >>> On Mon, Feb 24, 2025 at 10:16 AM David Hildenbrand <da...@redhat.com> > >>> wrote: > >>>> > >>>> On 24.02.25 09:54, Albert Esteve wrote: > >>>>> On Mon, Feb 17, 2025 at 9:01 PM David Hildenbrand <da...@redhat.com> > >>>>> wrote: > >>>>>> > >>>>>> On 17.02.25 17:40, Albert Esteve wrote: > >>>>>>> Hi all, > >>>>>>> > >>>>>> > >>>>>> Hi, > >>>>>> > >>>>>> looks like our debugging session was successfu :) > >>>>>> > >>>>>> One question below. > >>>>>> > >>>>>>> v3->v4 > >>>>>>> - Change mmap strategy to use RAM blocks > >>>>>>> and subregions. > >>>>>>> - Add new bitfield to qmp feature map > >>>>>>> - Followed most review comments from > >>>>>>> last iteration. > >>>>>>> - Merged documentation patch again with > >>>>>>> this one. Makes more sense to > >>>>>>> review them together after all. > >>>>>>> - Add documentation for MEM_READ/WRITE > >>>>>>> messages. > >>>>>>> > >>>>>>> The goal of this patch is to support > >>>>>>> dynamic fd-backed memory maps initiated > >>>>>>> from vhost-user backends. > >>>>>>> There are many devices that could already > >>>>>>> benefit of this feature, e.g., > >>>>>>> virtiofs or virtio-gpu. > >>>>>>> > >>>>>>> After receiving the SHMEM_MAP/UNMAP request, > >>>>>>> the frontend creates the RAMBlock form the > >>>>>>> fd and maps it by adding it as a subregion > >>>>>>> of the shared memory region container. > >>>>>>> > >>>>>>> The VIRTIO Shared Memory Region list is > >>>>>>> declared in the `VirtIODevice` struct > >>>>>>> to make it generic. > >>>>>>> > >>>>>>> TODO: There was a conversation on the > >>>>>>> previous version around adding tests > >>>>>>> to the patch (which I have acknowledged). > >>>>>>> However, given the numerous changes > >>>>>>> that the patch already has, I have > >>>>>>> decided to send it early and collect > >>>>>>> some feedback while I work on the > >>>>>>> tests for the next iteration. > >>>>>>> Given that I have been able to > >>>>>>> test the implementation with > >>>>>>> my local setup, I am more or less > >>>>>>> confident that, at least, the code > >>>>>>> is in a relatively sane state > >>>>>>> so that no reviewing time is > >>>>>>> wasted on broken patches. > >>>>>>> > >>>>>>> This patch also includes: > >>>>>>> - SHMEM_CONFIG frontend request that is > >>>>>>> specifically meant to allow generic > >>>>>>> vhost-user-device frontend to be able to > >>>>>>> query VIRTIO Shared Memory settings from the > >>>>>>> backend (as this device is generic and agnostic > >>>>>>> of the actual backend configuration). > >>>>>>> > >>>>>>> - MEM_READ/WRITE backend requests are > >>>>>>> added to deal with a potential issue when having > >>>>>>> multiple backends sharing a file descriptor. > >>>>>>> When a backend calls SHMEM_MAP it makes > >>>>>>> accessing to the region fail for other > >>>>>>> backend as it is missing from their translation > >>>>>>> table. So these requests are a fallback > >>>>>>> for vhost-user memory translation fails. > >>>>>> > >>>>>> Can you elaborate what the issue here is? > >>>>>> > >>>>>> Why would SHMEM_MAP make accessing the region fail for other backends > >>>>>> -- > >>>>>> what makes this missing from their translation? > >>>>> > >>>>> This issue was raised by Stefan Hajnoczi in one of the first > >>>>> iterations of this patchset, based upon previous David Gilbert's work > >>>>> on the virtiofs DAX Window. > >>>>> > >>>>> Let me paste here some of his remarks: > >>>>> > >>>>> """ > >>>>> Other backends don't see these mappings. If the guest submits a vring > >>>>> descriptor referencing a mapping to another backend, then that backend > >>>>> won't be able to access this memory. > >>>>> """ > >>>>> [...] > >>>>> """ > >>>>> A bit more detail: > >>>>> > >>>>> Device A has a VIRTIO Shared Memory Region. An application mmaps that > >>>>> memory (examples: guest userspace driver using Linux VFIO, a guest > >>>>> kernel driver that exposes the memory to userspace via mmap, or guest > >>>>> kernel DAX). The application passes that memory as an I/O buffer to > >>>>> device B (e.g. O_DIRECT disk I/O). > >>>>> > >>>>> The result is that device B's vhost-user backend receives a vring > >>>>> descriptor that points to a guest memory address in device A's VIRTIO > >>>>> Shared Memory Region. Since device B does not have this memory in its > >>>>> table, it cannot translate the address and the device breaks. > >>>>> """ > >>>>> > >>>>> I have not triggered the issue myself. So the idea is that the next > >>>>> patch will *definitively* include some testing for the commits that I > >>>>> cannot verify with my local setup. > >>>> > >>>> Hah! But isn't that exact problem which is now solved by our rework? > >>>> > >>>> Whatever is mapped in the VIRTIO Shared Memory Region will be > >>>> communicated to all other vhost-user devices. So they should have that > >>>> memory in their map and should be able to access it. > >>> > >>> You mean the SET_MEM_TABLE message after the vhost_commit is sent to > >>> all vhost-user devices? I was not sure, as I was testing with a single > >>> device, that would be great, and simplify the patch a lot. > >> > >> Yes, all vhost-user devices should be updated. > > > > Then, I think I agree with you, it would seem that this approach > > naturally solved the issue with address translation among different > > devices, as they all get the most up-to-date memory table after each > > mmap. > > > > WDYT, @Stefan Hajnoczi ? > > If we are unsure, maybe we can leave the MEM_READ/WRITE support as a > > later extension, and try to integrate the rest of this patch first. > > As commented offline, maybe one would want the option to enable the > alternative mode, where such updates (in the SHM region) are not sent to > vhost-user devices. In such a configuration, the MEM_READ / MEM_WRITE > would be unavoidable.
At first, I remember we discussed two options, having update messages sent to all devices (which was deemed as potentially racy), or using MEM_READ / MEM _WRITE messages. With this version of the patch there is no option to avoid the mem_table update messages, which brings me to my point in the previous message: it may make sense to continue with this patch without MEM_READ/WRITE support, and leave that and the option to make mem_table updates optional for a followup patch? > > What comes to mind are vhost-user devices with limited number of > supported memslots. > > No idea how relevant that really is, and how many SHM regions we will > see in practice. In general, from what I see they usually require 1 or 2 regions, except for virtio-scmi which requires >256. > > I recently increased the number of supported memslots for rust-vmm and > libvhost-user, but not sure about other devices (in particular, dpdk, > spdk, and whether we really care about that here). > > -- > Cheers, > > David / dhildenb >