Hi, On Monday, August 12, 2024 12:01:00 PM GMT+5:30 you wrote: > On Sun, Aug 11, 2024 at 7:20 PM Sahil <icegambi...@gmail.com> wrote: > > On Wednesday, August 7, 2024 9:52:10 PM GMT+5:30 Eugenio Perez Martin wrote: > > > On Fri, Aug 2, 2024 at 1:22 PM Sahil Siddiq <icegambi...@gmail.com> wrote: > > > > [...] > > > > @@ -726,17 +738,30 @@ void vhost_svq_start(VhostShadowVirtqueue *svq, > > > > VirtIODevice *vdev, > > > > svq->vring.num = virtio_queue_get_num(vdev, > > > > virtio_get_queue_index(vq)); > > > > svq->num_free = svq->vring.num; > > > > > > > > - svq->vring.desc = mmap(NULL, vhost_svq_driver_area_size(svq), > > > > - PROT_READ | PROT_WRITE, MAP_SHARED | > > > > MAP_ANONYMOUS, > > > > - -1, 0); > > > > - desc_size = sizeof(vring_desc_t) * svq->vring.num; > > > > - svq->vring.avail = (void *)((char *)svq->vring.desc + desc_size); > > > > - svq->vring.used = mmap(NULL, vhost_svq_device_area_size(svq), > > > > - PROT_READ | PROT_WRITE, MAP_SHARED | > > > > MAP_ANONYMOUS, > > > > - -1, 0); > > > > - svq->desc_state = g_new0(SVQDescState, svq->vring.num); > > > > - svq->desc_next = g_new0(uint16_t, svq->vring.num); > > > > - for (unsigned i = 0; i < svq->vring.num - 1; i++) { > > > > + svq->is_packed = virtio_vdev_has_feature(svq->vdev, > > > > VIRTIO_F_RING_PACKED); > > > > + > > > > + if (virtio_vdev_has_feature(svq->vdev, VIRTIO_F_RING_PACKED)) { > > > > + svq->vring_packed.vring.desc = mmap(NULL, > > > > vhost_svq_memory_packed(svq), > > > > + PROT_READ | PROT_WRITE, > > > > MAP_SHARED | MAP_ANONYMOUS, > > > > + -1, 0); > > > > + desc_size = sizeof(struct vring_packed_desc) * svq->vring.num; > > > > + svq->vring_packed.vring.driver = (void *)((char > > > > *)svq->vring_packed.vring.desc + desc_size); > > > > + svq->vring_packed.vring.device = (void *)((char > > > > *)svq->vring_packed.vring.driver + > > > > + sizeof(struct > > > > vring_packed_desc_event)); > > > > > > This is a great start but it will be problematic when you start > > > mapping the areas to the vdpa device. The driver area should be read > > > only for the device, but it is placed in the same page as a RW one. > > > > > > More on this later. > > > > > > > + } else { > > > > + svq->vring.desc = mmap(NULL, vhost_svq_driver_area_size(svq), > > > > + PROT_READ | PROT_WRITE, MAP_SHARED > > > > |MAP_ANONYMOUS, > > > > + -1, 0); > > > > + desc_size = sizeof(vring_desc_t) * svq->vring.num; > > > > + svq->vring.avail = (void *)((char *)svq->vring.desc + > > > > desc_size); > > > > + svq->vring.used = mmap(NULL, vhost_svq_device_area_size(svq), > > > > + PROT_READ | PROT_WRITE, MAP_SHARED > > > > |MAP_ANONYMOUS, > > > > + -1, 0); > > > > + } > > > > > > I think it will be beneficial to avoid "if (packed)" conditionals on > > > the exposed functions that give information about the memory maps. > > > These need to be replicated at > > > hw/virtio/vhost-vdpa.c:vhost_vdpa_svq_map_rings. > > > > > > However, the current one depends on the driver area to live in the > > > same page as the descriptor area, so it is not suitable for this. > > > > I haven't really understood this. > > > > In split vqs the descriptor, driver and device areas are mapped to RW pages. > > In vhost_vdpa.c:vhost_vdpa_svq_map_rings, the regions are mapped with > > the appropriate "perm" field that sets the R/W permissions in the DMAMap > > object. Is this problematic for the split vq format because the avail ring > > is > > anyway mapped to a RW page in "vhost_svq_start"? > > > > Ok so maybe the map word is misleading here. The pages needs to be > allocated for the QEMU process with both PROT_READ | PROT_WRITE, as > QEMU needs to write into it. > > They are mapped to the device with vhost_vdpa_dma_map, and the last > bool parameter indicates if the device needs write permissions or not. > You can see how hw/virtio/vhost-vdpa.c:vhost_vdpa_svq_map_ring checks > the needle permission for this, and the needle permissions are stored > at hw/virtio/vhost-vdpa.c:vhost_vdpa_svq_map_rings. This is the > function that needs to check for the maps permissions. >
I think I have understood what's going on in "vhost_vdpa_svq_map_rings", "vhost_vdpa_svq_map_ring" and "vhost_vdpa_dma_map". But based on what I have understood it looks like the driver area is getting mapped to an iova which is read-only for vhost_vdpa. Please let me know where I am going wrong. Consider the following implementation in hw/virtio/vhost_vdpa.c: > size_t device_size = vhost_svq_device_area_size(svq); > size_t driver_size = vhost_svq_driver_area_size(svq); The driver size includes the descriptor area and the driver area. For packed vq, the driver area is the "driver event suppression" structure which should be read-only for the device according to the virtio spec (section 2.8.10) [1]. > size_t avail_offset; > bool ok; > > vhost_svq_get_vring_addr(svq, &svq_addr); Over here "svq_addr.desc_user_addr" will point to the descriptor area while "svq_addr.avail_user_addr" will point to the driver area/driver event suppression structure. > driver_region = (DMAMap) { > .translated_addr = svq_addr.desc_user_addr, > .size = driver_size - 1, > .perm = IOMMU_RO, > }; This region points to the descriptor area and its size encompasses the driver area as well with RO permission. > ok = vhost_vdpa_svq_map_ring(v, &driver_region, errp); The above function checks the value of needle->perm and sees that it is RO. It then calls "vhost_vdpa_dma_map" with the following arguments: > r = vhost_vdpa_dma_map(v->shared, v->address_space_id, needle->iova, > needle->size + 1, > (void > *)(uintptr_t)needle->translated_addr, > needle->perm == IOMMU_RO); Since needle->size includes the driver area as well, the driver area will be mapped to a RO page in the device's address space, right? > if (unlikely(!ok)) { > error_prepend(errp, "Cannot create vq driver region: "); > return false; > } > addr->desc_user_addr = driver_region.iova; > avail_offset = svq_addr.avail_user_addr - svq_addr.desc_user_addr; > addr->avail_user_addr = driver_region.iova + avail_offset; I think "addr->desc_user_addr" and "addr->avail_user_addr" will both be mapped to a RO page in the device's address space. > device_region = (DMAMap) { > .translated_addr = svq_addr.used_user_addr, > .size = device_size - 1, > .perm = IOMMU_RW, > }; The device area/device event suppression structure on the other hand will be mapped to a RW page. I also think there are other issues with the current state of the patch. According to the virtio spec (section 2.8.10) [1], the "device event suppression" structure needs to be write-only for the device but is mapped to a RW page. Another concern I have is regarding the driver area size for packed vq. In "hw/virtio/vhost-shadow-virtqueue.c" of the current patch: > size_t vhost_svq_driver_area_size(const VhostShadowVirtqueue *svq) > { > size_t desc_size = sizeof(vring_desc_t) * svq->vring.num; > size_t avail_size = offsetof(vring_avail_t, ring[svq->vring.num]) + > > sizeof(uint16_t); > > return ROUND_UP(desc_size + avail_size, qemu_real_host_page_size()); > } > > [...] > > size_t vhost_svq_memory_packed(const VhostShadowVirtqueue *svq) > { > size_t desc_size = sizeof(struct vring_packed_desc) * svq->num_free; > size_t driver_event_suppression = sizeof(struct vring_packed_desc_event); > size_t device_event_suppression = sizeof(struct vring_packed_desc_event); > > return ROUND_UP(desc_size + driver_event_suppression + > device_event_suppression, > qemu_real_host_page_size()); > } The size returned by "vhost_svq_driver_area_size" might not be the actual driver size which is given by desc_size + driver_event_suppression, right? Will this have to be changed too? Thanks, Sahil [1] https://docs.oasis-open.org/virtio/virtio/v1.3/csd01/virtio-v1.3-csd01.html#x1-720008