On Mon, Aug 12, 2024 at 9:32 PM Sahil <icegambi...@gmail.com> wrote: > > Hi, > > On Monday, August 12, 2024 12:01:00 PM GMT+5:30 you wrote: > > On Sun, Aug 11, 2024 at 7:20 PM Sahil <icegambi...@gmail.com> wrote: > > > On Wednesday, August 7, 2024 9:52:10 PM GMT+5:30 Eugenio Perez Martin > > > wrote: > > > > On Fri, Aug 2, 2024 at 1:22 PM Sahil Siddiq <icegambi...@gmail.com> > > > > wrote: > > > > > [...] > > > > > @@ -726,17 +738,30 @@ void vhost_svq_start(VhostShadowVirtqueue *svq, > > > > > VirtIODevice *vdev, > > > > > svq->vring.num = virtio_queue_get_num(vdev, > > > > > virtio_get_queue_index(vq)); > > > > > svq->num_free = svq->vring.num; > > > > > > > > > > - svq->vring.desc = mmap(NULL, vhost_svq_driver_area_size(svq), > > > > > - PROT_READ | PROT_WRITE, MAP_SHARED | > > > > > MAP_ANONYMOUS, > > > > > - -1, 0); > > > > > - desc_size = sizeof(vring_desc_t) * svq->vring.num; > > > > > - svq->vring.avail = (void *)((char *)svq->vring.desc + desc_size); > > > > > - svq->vring.used = mmap(NULL, vhost_svq_device_area_size(svq), > > > > > - PROT_READ | PROT_WRITE, MAP_SHARED | > > > > > MAP_ANONYMOUS, > > > > > - -1, 0); > > > > > - svq->desc_state = g_new0(SVQDescState, svq->vring.num); > > > > > - svq->desc_next = g_new0(uint16_t, svq->vring.num); > > > > > - for (unsigned i = 0; i < svq->vring.num - 1; i++) { > > > > > + svq->is_packed = virtio_vdev_has_feature(svq->vdev, > > > > > VIRTIO_F_RING_PACKED); > > > > > + > > > > > + if (virtio_vdev_has_feature(svq->vdev, VIRTIO_F_RING_PACKED)) { > > > > > + svq->vring_packed.vring.desc = mmap(NULL, > > > > > vhost_svq_memory_packed(svq), > > > > > + PROT_READ | PROT_WRITE, > > > > > MAP_SHARED | MAP_ANONYMOUS, > > > > > + -1, 0); > > > > > + desc_size = sizeof(struct vring_packed_desc) * > > > > > svq->vring.num; > > > > > + svq->vring_packed.vring.driver = (void *)((char > > > > > *)svq->vring_packed.vring.desc + desc_size); > > > > > + svq->vring_packed.vring.device = (void *)((char > > > > > *)svq->vring_packed.vring.driver + > > > > > + sizeof(struct > > > > > vring_packed_desc_event)); > > > > > > > > This is a great start but it will be problematic when you start > > > > mapping the areas to the vdpa device. The driver area should be read > > > > only for the device, but it is placed in the same page as a RW one. > > > > > > > > More on this later. > > > > > > > > > + } else { > > > > > + svq->vring.desc = mmap(NULL, vhost_svq_driver_area_size(svq), > > > > > + PROT_READ | PROT_WRITE, MAP_SHARED > > > > > |MAP_ANONYMOUS, > > > > > + -1, 0); > > > > > + desc_size = sizeof(vring_desc_t) * svq->vring.num; > > > > > + svq->vring.avail = (void *)((char *)svq->vring.desc + > > > > > desc_size); > > > > > + svq->vring.used = mmap(NULL, vhost_svq_device_area_size(svq), > > > > > + PROT_READ | PROT_WRITE, MAP_SHARED > > > > > |MAP_ANONYMOUS, > > > > > + -1, 0); > > > > > + } > > > > > > > > I think it will be beneficial to avoid "if (packed)" conditionals on > > > > the exposed functions that give information about the memory maps. > > > > These need to be replicated at > > > > hw/virtio/vhost-vdpa.c:vhost_vdpa_svq_map_rings. > > > > > > > > However, the current one depends on the driver area to live in the > > > > same page as the descriptor area, so it is not suitable for this. > > > > > > I haven't really understood this. > > > > > > In split vqs the descriptor, driver and device areas are mapped to RW > > > pages. > > > In vhost_vdpa.c:vhost_vdpa_svq_map_rings, the regions are mapped with > > > the appropriate "perm" field that sets the R/W permissions in the DMAMap > > > object. Is this problematic for the split vq format because the avail > > > ring is > > > anyway mapped to a RW page in "vhost_svq_start"? > > > > > > > Ok so maybe the map word is misleading here. The pages needs to be > > allocated for the QEMU process with both PROT_READ | PROT_WRITE, as > > QEMU needs to write into it. > > > > They are mapped to the device with vhost_vdpa_dma_map, and the last > > bool parameter indicates if the device needs write permissions or not. > > You can see how hw/virtio/vhost-vdpa.c:vhost_vdpa_svq_map_ring checks > > the needle permission for this, and the needle permissions are stored > > at hw/virtio/vhost-vdpa.c:vhost_vdpa_svq_map_rings. This is the > > function that needs to check for the maps permissions. > > > > I think I have understood what's going on in "vhost_vdpa_svq_map_rings", > "vhost_vdpa_svq_map_ring" and "vhost_vdpa_dma_map". But based on > what I have understood it looks like the driver area is getting mapped to > an iova which is read-only for vhost_vdpa. Please let me know where I am > going wrong. >
You're not going wrong there. The device does not need to write into this area, so we map it read only. > Consider the following implementation in hw/virtio/vhost_vdpa.c: > > > size_t device_size = vhost_svq_device_area_size(svq); > > size_t driver_size = vhost_svq_driver_area_size(svq); > > The driver size includes the descriptor area and the driver area. For > packed vq, the driver area is the "driver event suppression" structure > which should be read-only for the device according to the virtio spec > (section 2.8.10) [1]. > > > size_t avail_offset; > > bool ok; > > > > vhost_svq_get_vring_addr(svq, &svq_addr); > > Over here "svq_addr.desc_user_addr" will point to the descriptor area > while "svq_addr.avail_user_addr" will point to the driver area/driver > event suppression structure. > > > driver_region = (DMAMap) { > > .translated_addr = svq_addr.desc_user_addr, > > .size = driver_size - 1, > > .perm = IOMMU_RO, > > }; > > This region points to the descriptor area and its size encompasses the > driver area as well with RO permission. > > > ok = vhost_vdpa_svq_map_ring(v, &driver_region, errp); > > The above function checks the value of needle->perm and sees that it is RO. > It then calls "vhost_vdpa_dma_map" with the following arguments: > > > r = vhost_vdpa_dma_map(v->shared, v->address_space_id, needle->iova, > > needle->size + 1, > > (void > > *)(uintptr_t)needle->translated_addr, > > needle->perm == IOMMU_RO); > > Since needle->size includes the driver area as well, the driver area will be > mapped to a RO page in the device's address space, right? > Yes, the device does not need to write into the descriptor area in the supported split virtqueue case. So the descriptor area is also mapped RO at this moment. This change in the packed virtqueue case, so we need to map it RW. > > if (unlikely(!ok)) { > > error_prepend(errp, "Cannot create vq driver region: "); > > return false; > > } > > addr->desc_user_addr = driver_region.iova; > > avail_offset = svq_addr.avail_user_addr - svq_addr.desc_user_addr; > > addr->avail_user_addr = driver_region.iova + avail_offset; > > I think "addr->desc_user_addr" and "addr->avail_user_addr" will both be > mapped to a RO page in the device's address space. > > > device_region = (DMAMap) { > > .translated_addr = svq_addr.used_user_addr, > > .size = device_size - 1, > > .perm = IOMMU_RW, > > }; > > The device area/device event suppression structure on the other hand will > be mapped to a RW page. > > I also think there are other issues with the current state of the patch. > According > to the virtio spec (section 2.8.10) [1], the "device event suppression" > structure > needs to be write-only for the device but is mapped to a RW page. > Yes, I'm not sure if all IOMMU supports write-only maps to be honest. > Another concern I have is regarding the driver area size for packed vq. In > "hw/virtio/vhost-shadow-virtqueue.c" of the current patch: > > size_t vhost_svq_driver_area_size(const VhostShadowVirtqueue *svq) > > { > > size_t desc_size = sizeof(vring_desc_t) * svq->vring.num; > > size_t avail_size = offsetof(vring_avail_t, ring[svq->vring.num]) + > > > > sizeof(uint16_t); > > > > return ROUND_UP(desc_size + avail_size, qemu_real_host_page_size()); > > } > > > > [...] > > > > size_t vhost_svq_memory_packed(const VhostShadowVirtqueue *svq) > > { > > size_t desc_size = sizeof(struct vring_packed_desc) * svq->num_free; > > size_t driver_event_suppression = sizeof(struct > > vring_packed_desc_event); > > size_t device_event_suppression = sizeof(struct > > vring_packed_desc_event); > > > > return ROUND_UP(desc_size + driver_event_suppression + > > device_event_suppression, > > qemu_real_host_page_size()); > > } > > The size returned by "vhost_svq_driver_area_size" might not be the actual > driver > size which is given by desc_size + driver_event_suppression, right? Will this > have to > be changed too? > Yes, you're right this needs to be changed too. > Thanks, > Sahil > > [1] > https://docs.oasis-open.org/virtio/virtio/v1.3/csd01/virtio-v1.3-csd01.html#x1-720008 > > >