On Thu, Jul 11, 2024 at 9:45 AM Stefan Hajnoczi <stefa...@redhat.com> wrote:

> On Fri, Jun 28, 2024 at 04:57:06PM +0200, Albert Esteve wrote:
> > Add SHMEM_MAP/UNMAP requests to vhost-user to
> > handle VIRTIO Shared Memory mappings.
> >
> > This request allows backends to dynamically map
> > fds into a VIRTIO Shared Memory Region indentified
> > by its `shmid`. Then, the fd memory is advertised
> > to the driver as a base addres + offset, so it
> > can be read/written (depending on the mmap flags
> > requested) while its valid.
> >
> > The backend can munmap the memory range
> > in a given VIRTIO Shared Memory Region (again,
> > identified by its `shmid`), to free it. Upon
> > receiving this message, the front-end must
> > mmap the regions with PROT_NONE to reserve
> > the virtual memory space.
> >
> > The device model needs to create MemoryRegion
> > instances for the VIRTIO Shared Memory Regions
> > and add them to the `VirtIODevice` instance.
> >
> > Signed-off-by: Albert Esteve <aest...@redhat.com>
> > ---
> >  docs/interop/vhost-user.rst               |  27 +++++
> >  hw/virtio/vhost-user.c                    | 122 ++++++++++++++++++++++
> >  hw/virtio/virtio.c                        |  12 +++
> >  include/hw/virtio/virtio.h                |   5 +
> >  subprojects/libvhost-user/libvhost-user.c |  65 ++++++++++++
> >  subprojects/libvhost-user/libvhost-user.h |  53 ++++++++++
> >  6 files changed, 284 insertions(+)
> >
> > diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
> > index d8419fd2f1..d52ba719d5 100644
> > --- a/docs/interop/vhost-user.rst
> > +++ b/docs/interop/vhost-user.rst
> > @@ -1859,6 +1859,33 @@ is sent by the front-end.
> >    when the operation is successful, or non-zero otherwise. Note that if
> the
> >    operation fails, no fd is sent to the backend.
> >
> > +``VHOST_USER_BACKEND_SHMEM_MAP``
> > +  :id: 9
> > +  :equivalent ioctl: N/A
> > +  :request payload: fd and ``struct VhostUserMMap``
> > +  :reply payload: N/A
> > +
> > +  This message can be submitted by the backends to advertise a new
> mapping
> > +  to be made in a given VIRTIO Shared Memory Region. Upon receiving the
> message,
> > +  The front-end will mmap the given fd into the VIRTIO Shared Memory
> Region
> > +  with the requested ``shmid``. A reply is generated indicating whether
> mapping
> > +  succeeded.
> > +
> > +  Mapping over an already existing map is not allowed and request shall
> fail.
> > +  Therefore, the memory range in the request must correspond with a
> valid,
> > +  free region of the VIRTIO Shared Memory Region.
> > +
> > +``VHOST_USER_BACKEND_SHMEM_UNMAP``
> > +  :id: 10
> > +  :equivalent ioctl: N/A
> > +  :request payload: ``struct VhostUserMMap``
> > +  :reply payload: N/A
> > +
> > +  This message can be submitted by the backends so that the front-end
> un-mmap
> > +  a given range (``offset``, ``len``) in the VIRTIO Shared Memory
> Region with
>
> s/offset/shm_offset/
>
> > +  the requested ``shmid``.
>
> Please clarify that <offset, len> must correspond to the entirety of a
> valid mapped region.
>
> By the way, the VIRTIO 1.3 gives the following behavior for the virtiofs
> DAX Window:
>
>   When a FUSE_SETUPMAPPING request perfectly overlaps a previous
>   mapping, the previous mapping is replaced. When a mapping partially
>   overlaps a previous mapping, the previous mapping is split into one or
>   two smaller mappings. When a mapping is partially unmapped it is also
>   split into one or two smaller mappings.
>
>   Establishing new mappings or splitting existing mappings consumes
>   resources. If the device runs out of resources the FUSE_SETUPMAPPING
>   request fails until resources are available again following
>   FUSE_REMOVEMAPPING.
>
> I think SETUPMAPPING/REMOVMAPPING can be implemented using
> SHMEM_MAP/UNMAP. SHMEM_MAP/UNMAP do not allow atomically replacing
> partial ranges, but as far as I know that's not necessary for virtiofs
> in practice.
>
> It's worth mentioning that mappings consume resources and that SHMEM_MAP
> can fail when there are no resources available. The process-wide limit
> is vm.max_map_count on Linux although a vhost-user frontend may reduce
> it further to control vhost-user resource usage.
>
> > +  A reply is generated indicating whether unmapping succeeded.
> > +
> >  .. _reply_ack:
> >
> >  VHOST_USER_PROTOCOL_F_REPLY_ACK
> > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > index cdf9af4a4b..7ee8a472c6 100644
> > --- a/hw/virtio/vhost-user.c
> > +++ b/hw/virtio/vhost-user.c
> > @@ -115,6 +115,8 @@ typedef enum VhostUserBackendRequest {
> >      VHOST_USER_BACKEND_SHARED_OBJECT_ADD = 6,
> >      VHOST_USER_BACKEND_SHARED_OBJECT_REMOVE = 7,
> >      VHOST_USER_BACKEND_SHARED_OBJECT_LOOKUP = 8,
> > +    VHOST_USER_BACKEND_SHMEM_MAP = 9,
> > +    VHOST_USER_BACKEND_SHMEM_UNMAP = 10,
> >      VHOST_USER_BACKEND_MAX
> >  }  VhostUserBackendRequest;
> >
> > @@ -192,6 +194,24 @@ typedef struct VhostUserShared {
> >      unsigned char uuid[16];
> >  } VhostUserShared;
> >
> > +/* For the flags field of VhostUserMMap */
> > +#define VHOST_USER_FLAG_MAP_R (1u << 0)
> > +#define VHOST_USER_FLAG_MAP_W (1u << 1)
> > +
> > +typedef struct {
> > +    /* VIRTIO Shared Memory Region ID */
> > +    uint8_t shmid;
> > +    uint8_t padding[7];
> > +    /* File offset */
> > +    uint64_t fd_offset;
> > +    /* Offset within the VIRTIO Shared Memory Region */
> > +    uint64_t shm_offset;
> > +    /* Size of the mapping */
> > +    uint64_t len;
> > +    /* Flags for the mmap operation, from VHOST_USER_FLAG_* */
> > +    uint64_t flags;
> > +} VhostUserMMap;
> > +
> >  typedef struct {
> >      VhostUserRequest request;
> >
> > @@ -224,6 +244,7 @@ typedef union {
> >          VhostUserInflight inflight;
> >          VhostUserShared object;
> >          VhostUserTransferDeviceState transfer_state;
> > +        VhostUserMMap mmap;
> >  } VhostUserPayload;
> >
> >  typedef struct VhostUserMsg {
> > @@ -1748,6 +1769,100 @@
> vhost_user_backend_handle_shared_object_lookup(struct vhost_user *u,
> >      return 0;
> >  }
> >
> > +static int
> > +vhost_user_backend_handle_shmem_map(struct vhost_dev *dev,
> > +                                    VhostUserMMap *vu_mmap,
> > +                                    int fd)
> > +{
> > +    void *addr = 0;
> > +    MemoryRegion *mr = NULL;
> > +
> > +    if (fd < 0) {
> > +        error_report("Bad fd for map");
> > +        return -EBADF;
> > +    }
> > +
> > +    if (!dev->vdev->shmem_list ||
> > +        dev->vdev->n_shmem_regions <= vu_mmap->shmid) {
> > +        error_report("Device only has %d VIRTIO Shared Memory Regions. "
> > +                     "Requested ID: %d",
> > +                     dev->vdev->n_shmem_regions, vu_mmap->shmid);
> > +        return -EFAULT;
> > +    }
> > +
> > +    mr = &dev->vdev->shmem_list[vu_mmap->shmid];
> > +
> > +    if (!mr) {
> > +        error_report("VIRTIO Shared Memory Region at "
> > +                     "ID %d unitialized", vu_mmap->shmid);
> > +        return -EFAULT;
> > +    }
> > +
> > +    if ((vu_mmap->shm_offset + vu_mmap->len) < vu_mmap->len ||
> > +        (vu_mmap->shm_offset + vu_mmap->len) > mr->size) {
> > +        error_report("Bad offset/len for mmap %" PRIx64 "+%" PRIx64,
> > +                     vu_mmap->shm_offset, vu_mmap->len);
> > +        return -EFAULT;
> > +    }
> > +
> > +    void *shmem_ptr = memory_region_get_ram_ptr(mr);
> > +
> > +    addr = mmap(shmem_ptr + vu_mmap->shm_offset, vu_mmap->len,
>
> Missing check for overlap between range [shm_offset, shm_offset + len)
> and existing mappings.
>

Not sure how to do this check. Specifically, I am not sure how previous
ranges are stored within the MemoryRegion. Is looping through mr->subregions
a valid option?


>
> > +        ((vu_mmap->flags & VHOST_USER_FLAG_MAP_R) ? PROT_READ : 0) |
> > +        ((vu_mmap->flags & VHOST_USER_FLAG_MAP_W) ? PROT_WRITE : 0),
> > +        MAP_SHARED | MAP_FIXED, fd, vu_mmap->fd_offset);
> > +
> > +    if (addr == MAP_FAILED) {
> > +        error_report("Failed to mmap mem fd");
> > +        return -EFAULT;
> > +    }
> > +
> > +    return 0;
> > +}
> > +
> > +static int
> > +vhost_user_backend_handle_shmem_unmap(struct vhost_dev *dev,
> > +                                      VhostUserMMap *vu_mmap)
> > +{
> > +    void *addr = 0;
> > +    MemoryRegion *mr = NULL;
> > +
> > +    if (!dev->vdev->shmem_list ||
> > +        dev->vdev->n_shmem_regions <= vu_mmap->shmid) {
> > +        error_report("Device only has %d VIRTIO Shared Memory Regions. "
> > +                     "Requested ID: %d",
> > +                     dev->vdev->n_shmem_regions, vu_mmap->shmid);
> > +        return -EFAULT;
> > +    }
> > +
> > +    mr = &dev->vdev->shmem_list[vu_mmap->shmid];
> > +
> > +    if (!mr) {
> > +        error_report("VIRTIO Shared Memory Region at "
> > +                     "ID %d unitialized", vu_mmap->shmid);
> > +        return -EFAULT;
> > +    }
> > +
> > +    if ((vu_mmap->shm_offset + vu_mmap->len) < vu_mmap->len ||
> > +        (vu_mmap->shm_offset + vu_mmap->len) > mr->size) {
> > +        error_report("Bad offset/len for mmap %" PRIx64 "+%" PRIx64,
> > +                     vu_mmap->shm_offset, vu_mmap->len);
> > +        return -EFAULT;
> > +    }
> > +
> > +    void *shmem_ptr = memory_region_get_ram_ptr(mr);
> > +
> > +    addr = mmap(shmem_ptr + vu_mmap->shm_offset, vu_mmap->len,
>
> Missing check for existing mapping with exact range [shm_offset, len)
> match.
>
> > +                PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1,
> 0);
> > +
> > +    if (addr == MAP_FAILED) {
> > +        error_report("Failed to unmap memory");
> > +        return -EFAULT;
> > +    }
> > +
> > +    return 0;
> > +}
> > +
> >  static void close_backend_channel(struct vhost_user *u)
> >  {
> >      g_source_destroy(u->backend_src);
> > @@ -1816,6 +1931,13 @@ static gboolean backend_read(QIOChannel *ioc,
> GIOCondition condition,
> >          ret =
> vhost_user_backend_handle_shared_object_lookup(dev->opaque, ioc,
> >                                                               &hdr,
> &payload);
> >          break;
> > +    case VHOST_USER_BACKEND_SHMEM_MAP:
> > +        ret = vhost_user_backend_handle_shmem_map(dev, &payload.mmap,
> > +                                                  fd ? fd[0] : -1);
> > +        break;
> > +    case VHOST_USER_BACKEND_SHMEM_UNMAP:
> > +        ret = vhost_user_backend_handle_shmem_unmap(dev, &payload.mmap);
> > +        break;
> >      default:
> >          error_report("Received unexpected msg type: %d.", hdr.request);
> >          ret = -EINVAL;
> > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> > index 893a072c9d..9f2da5b11e 100644
> > --- a/hw/virtio/virtio.c
> > +++ b/hw/virtio/virtio.c
> > @@ -2856,6 +2856,16 @@ int virtio_save(VirtIODevice *vdev, QEMUFile *f)
> >      return vmstate_save_state(f, &vmstate_virtio, vdev, NULL);
> >  }
> >
> > +MemoryRegion *virtio_new_shmem_region(VirtIODevice *vdev)
> > +{
> > +    MemoryRegion *mr = g_new0(MemoryRegion, 1);
> > +    ++vdev->n_shmem_regions;
> > +    vdev->shmem_list = g_renew(MemoryRegion, vdev->shmem_list,
> > +                               vdev->n_shmem_regions);
>
> Where is shmem_list freed?
>
> The name "list" is misleading since this is an array, not a list.
>
> > +    vdev->shmem_list[vdev->n_shmem_regions - 1] = *mr;
> > +    return mr;
> > +}
>
> This looks weird. The contents of mr are copied into shmem_list[] and
> then the pointer to mr is returned? Did you mean for the field's type to
> be MemoryRegion **shmem_list and then vdev->shmem_list[...] = mr would
> stash the pointer?
>
> > +
> >  /* A wrapper for use as a VMState .put function */
> >  static int virtio_device_put(QEMUFile *f, void *opaque, size_t size,
> >                                const VMStateField *field, JSONWriter
> *vmdesc)
> > @@ -3264,6 +3274,8 @@ void virtio_init(VirtIODevice *vdev, uint16_t
> device_id, size_t config_size)
> >              virtio_vmstate_change, vdev);
> >      vdev->device_endian = virtio_default_endian();
> >      vdev->use_guest_notifier_mask = true;
> > +    vdev->shmem_list = NULL;
> > +    vdev->n_shmem_regions = 0;
> >  }
> >
> >  /*
> > diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> > index 7d5ffdc145..16d598aadc 100644
> > --- a/include/hw/virtio/virtio.h
> > +++ b/include/hw/virtio/virtio.h
> > @@ -165,6 +165,9 @@ struct VirtIODevice
> >       */
> >      EventNotifier config_notifier;
> >      bool device_iotlb_enabled;
> > +    /* Shared memory region for vhost-user mappings. */
> > +    MemoryRegion *shmem_list;
> > +    int n_shmem_regions;
> >  };
> >
> >  struct VirtioDeviceClass {
> > @@ -280,6 +283,8 @@ void virtio_notify(VirtIODevice *vdev, VirtQueue
> *vq);
> >
> >  int virtio_save(VirtIODevice *vdev, QEMUFile *f);
> >
> > +MemoryRegion *virtio_new_shmem_region(VirtIODevice *vdev);
> > +
> >  extern const VMStateInfo virtio_vmstate_info;
> >
> >  #define VMSTATE_VIRTIO_DEVICE \
> > diff --git a/subprojects/libvhost-user/libvhost-user.c
> b/subprojects/libvhost-user/libvhost-user.c
> > index a879149fef..28556d183a 100644
> > --- a/subprojects/libvhost-user/libvhost-user.c
> > +++ b/subprojects/libvhost-user/libvhost-user.c
> > @@ -1586,6 +1586,71 @@ vu_rm_shared_object(VuDev *dev, unsigned char
> uuid[UUID_LEN])
> >      return vu_send_message(dev, &msg);
> >  }
> >
> > +bool
> > +vu_shmem_map(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
> > +             uint64_t shm_offset, uint64_t len, uint64_t flags)
> > +{
> > +    bool result = false;
> > +    VhostUserMsg msg_reply;
> > +    VhostUserMsg vmsg = {
> > +        .request = VHOST_USER_BACKEND_SHMEM_MAP,
> > +        .size = sizeof(vmsg.payload.mmap),
> > +        .flags = VHOST_USER_VERSION,
> > +        .payload.mmap = {
> > +            .shmid = shmid,
> > +            .fd_offset = fd_offset,
> > +            .shm_offset = shm_offset,
> > +            .len = len,
> > +            .flags = flags,
> > +        },
> > +    };
> > +
> > +    if (vu_has_protocol_feature(dev, VHOST_USER_PROTOCOL_F_REPLY_ACK)) {
> > +        vmsg.flags |= VHOST_USER_NEED_REPLY_MASK;
> > +    }
> > +
> > +    pthread_mutex_lock(&dev->backend_mutex);
> > +    if (!vu_message_write(dev, dev->backend_fd, &vmsg)) {
> > +        pthread_mutex_unlock(&dev->backend_mutex);
> > +        return false;
> > +    }
> > +
> > +    /* Also unlocks the backend_mutex */
> > +    return vu_process_message_reply(dev, &vmsg);
> > +}
> > +
> > +bool
> > +vu_shmem_unmap(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
> > +               uint64_t shm_offset, uint64_t len)
> > +{
> > +    bool result = false;
> > +    VhostUserMsg msg_reply;
> > +    VhostUserMsg vmsg = {
> > +        .request = VHOST_USER_BACKEND_SHMEM_UNMAP,
> > +        .size = sizeof(vmsg.payload.mmap),
> > +        .flags = VHOST_USER_VERSION,
> > +        .payload.mmap = {
> > +            .shmid = shmid,
> > +            .fd_offset = fd_offset,
>
> What is the meaning of this field? I expected it to be set to 0.
>
> > +            .shm_offset = shm_offset,
> > +            .len = len,
> > +        },
> > +    };
> > +
> > +    if (vu_has_protocol_feature(dev, VHOST_USER_PROTOCOL_F_REPLY_ACK)) {
> > +        vmsg.flags |= VHOST_USER_NEED_REPLY_MASK;
> > +    }
> > +
> > +    pthread_mutex_lock(&dev->backend_mutex);
> > +    if (!vu_message_write(dev, dev->backend_fd, &vmsg)) {
> > +        pthread_mutex_unlock(&dev->backend_mutex);
> > +        return false;
> > +    }
> > +
> > +    /* Also unlocks the backend_mutex */
> > +    return vu_process_message_reply(dev, &vmsg);
> > +}
> > +
> >  static bool
> >  vu_set_vring_call_exec(VuDev *dev, VhostUserMsg *vmsg)
> >  {
> > diff --git a/subprojects/libvhost-user/libvhost-user.h
> b/subprojects/libvhost-user/libvhost-user.h
> > index deb40e77b3..7f6c22cc1a 100644
> > --- a/subprojects/libvhost-user/libvhost-user.h
> > +++ b/subprojects/libvhost-user/libvhost-user.h
> > @@ -127,6 +127,8 @@ typedef enum VhostUserBackendRequest {
> >      VHOST_USER_BACKEND_SHARED_OBJECT_ADD = 6,
> >      VHOST_USER_BACKEND_SHARED_OBJECT_REMOVE = 7,
> >      VHOST_USER_BACKEND_SHARED_OBJECT_LOOKUP = 8,
> > +    VHOST_USER_BACKEND_SHMEM_MAP = 9,
> > +    VHOST_USER_BACKEND_SHMEM_UNMAP = 10,
> >      VHOST_USER_BACKEND_MAX
> >  }  VhostUserBackendRequest;
> >
> > @@ -186,6 +188,24 @@ typedef struct VhostUserShared {
> >      unsigned char uuid[UUID_LEN];
> >  } VhostUserShared;
> >
> > +/* For the flags field of VhostUserMMap */
> > +#define VHOST_USER_FLAG_MAP_R (1u << 0)
> > +#define VHOST_USER_FLAG_MAP_W (1u << 1)
> > +
> > +typedef struct {
> > +    /* VIRTIO Shared Memory Region ID */
> > +    uint8_t shmid;
> > +    uint8_t padding[7];
> > +    /* File offset */
> > +    uint64_t fd_offset;
> > +    /* Offset within the VIRTIO Shared Memory Region */
> > +    uint64_t shm_offset;
> > +    /* Size of the mapping */
> > +    uint64_t len;
> > +    /* Flags for the mmap operation, from VHOST_USER_FLAG_* */
> > +    uint64_t flags;
> > +} VhostUserMMap;
> > +
> >  #if defined(_WIN32) && (defined(__x86_64__) || defined(__i386__))
> >  # define VU_PACKED __attribute__((gcc_struct, packed))
> >  #else
> > @@ -214,6 +234,7 @@ typedef struct VhostUserMsg {
> >          VhostUserVringArea area;
> >          VhostUserInflight inflight;
> >          VhostUserShared object;
> > +        VhostUserMMap mmap;
> >      } payload;
> >
> >      int fds[VHOST_MEMORY_BASELINE_NREGIONS];
> > @@ -597,6 +618,38 @@ bool vu_add_shared_object(VuDev *dev, unsigned char
> uuid[UUID_LEN]);
> >   */
> >  bool vu_rm_shared_object(VuDev *dev, unsigned char uuid[UUID_LEN]);
> >
> > +/**
> > + * vu_shmem_map:
> > + * @dev: a VuDev context
> > + * @shmid: VIRTIO Shared Memory Region ID
> > + * @fd_offset: File offset
> > + * @shm_offset: Offset within the VIRTIO Shared Memory Region
> > + * @len: Size of the mapping
> > + * @flags: Flags for the mmap operation
> > + *
> > + * Advertises a new mapping to be made in a given VIRTIO Shared Memory
> Region.
> > + *
> > + * Returns: TRUE on success, FALSE on failure.
> > + */
> > +bool vu_shmem_map(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
> > +                  uint64_t shm_offset, uint64_t len, uint64_t flags);
> > +
> > +/**
> > + * vu_shmem_map:
> > + * @dev: a VuDev context
> > + * @shmid: VIRTIO Shared Memory Region ID
> > + * @fd_offset: File offset
> > + * @shm_offset: Offset within the VIRTIO Shared Memory Region
> > + * @len: Size of the mapping
> > + *
> > + * The front-end un-mmaps a given range in the VIRTIO Shared Memory
> Region
> > + * with the requested `shmid`.
> > + *
> > + * Returns: TRUE on success, FALSE on failure.
> > + */
> > +bool vu_shmem_unmap(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
> > +                  uint64_t shm_offset, uint64_t len);
> > +
> >  /**
> >   * vu_queue_set_notification:
> >   * @dev: a VuDev context
> > --
> > 2.45.2
> >
>

Reply via email to