On Thu, Mar 27, 2025 at 7:42 PM Sahil Siddiq <icegambi...@gmail.com> wrote: > > Hi, > > On 3/26/25 1:33 PM, Eugenio Perez Martin wrote: > > On Mon, Mar 24, 2025 at 3:14 PM Sahil Siddiq <icegambi...@gmail.com> wrote: > >> On 3/24/25 7:29 PM, Sahil Siddiq wrote: > >>> Implement the insertion of available buffers in the descriptor area of > >>> packed shadow virtqueues. It takes into account descriptor chains, but > >>> does not consider indirect descriptors. > >>> > >>> Enable the packed SVQ to forward the descriptors to the device. > >>> > >>> Signed-off-by: Sahil Siddiq <sahil...@proton.me> > >>> --- > >>> Changes from v4 -> v5: > >>> - This was commit #2 in v4. This has been reordered to commit #3 > >>> based on review comments. > >>> - vhost-shadow-virtqueue.c: > >>> (vhost_svq_valid_features): Move addition of enums to commit #6 > >>> based on review comments. > >>> (vhost_svq_add_packed): Set head_idx to buffer id instead of vring's > >>> index. > >>> (vhost_svq_kick): Split into vhost_svq_kick_split and > >>> vhost_svq_kick_packed. > >>> (vhost_svq_add): Use new vhost_svq_kick_* functions. > >>> > >>> hw/virtio/vhost-shadow-virtqueue.c | 117 +++++++++++++++++++++++++++-- > >>> 1 file changed, 112 insertions(+), 5 deletions(-) > >>> > >>> diff --git a/hw/virtio/vhost-shadow-virtqueue.c > >>> b/hw/virtio/vhost-shadow-virtqueue.c > >>> index 4f74ad402a..6e16cd4bdf 100644 > >>> --- a/hw/virtio/vhost-shadow-virtqueue.c > >>> +++ b/hw/virtio/vhost-shadow-virtqueue.c > >>> @@ -193,10 +193,83 @@ static void > >>> vhost_svq_add_split(VhostShadowVirtqueue *svq, > >>> /* Update the avail index after write the descriptor */ > >>> smp_wmb(); > >>> avail->idx = cpu_to_le16(svq->shadow_avail_idx); > >>> +} > >>> + > >>> +/** > >>> + * Write descriptors to SVQ packed vring > >>> + * > >>> + * @svq: The shadow virtqueue > >>> + * @out_sg: The iovec to the guest > >>> + * @out_num: Outgoing iovec length > >>> + * @in_sg: The iovec from the guest > >>> + * @in_num: Incoming iovec length > >>> + * @sgs: Cache for hwaddr > >>> + * @head: Saves current free_head > >>> + */ > >>> +static void vhost_svq_add_packed(VhostShadowVirtqueue *svq, > >>> + const struct iovec *out_sg, size_t > >>> out_num, > >>> + const struct iovec *in_sg, size_t > >>> in_num, > >>> + hwaddr *sgs, unsigned *head) > >>> +{ > >>> + uint16_t id, curr, i, head_flags = 0, head_idx; > >>> + size_t num = out_num + in_num; > >>> + unsigned n; > >>> + > >>> + struct vring_packed_desc *descs = svq->vring_packed.vring.desc; > >>> + > >>> + head_idx = svq->vring_packed.next_avail_idx; > >> > >> Since "svq->vring_packed.next_avail_idx" is part of QEMU internals and not > >> stored in guest memory, no endianness conversion is required here, right? > >> > > > > Right! > > Understood. > > >>> + i = head_idx; > >>> + id = svq->free_head; > >>> + curr = id; > >>> + *head = id; > >> > >> Should head be the buffer id or the idx of the descriptor ring where the > >> first descriptor of a descriptor chain is inserted? > >> > > > > The buffer id of the *last* descriptor of a chain. See "2.8.6 Next > > Flag: Descriptor Chaining" at [1]. > > Ah, yes. The second half of my question in incorrect. > > The tail descriptor of the chain includes the buffer id. In this > implementation > we place the same tail buffer id in other locations of the descriptor ring > since > they will be ignored anyway [1]. > > The explanation below frames my query better. > > >>> + /* Write descriptors to SVQ packed vring */ > >>> + for (n = 0; n < num; n++) { > >>> + uint16_t flags = cpu_to_le16(svq->vring_packed.avail_used_flags | > >>> + (n < out_num ? 0 : > >>> VRING_DESC_F_WRITE) | > >>> + (n + 1 == num ? 0 : > >>> VRING_DESC_F_NEXT)); > >>> + if (i == head_idx) { > >>> + head_flags = flags; > >>> + } else { > >>> + descs[i].flags = flags; > >>> + } > >>> + > >>> + descs[i].addr = cpu_to_le64(sgs[n]); > >>> + descs[i].id = id; > >>> + if (n < out_num) { > >>> + descs[i].len = cpu_to_le32(out_sg[n].iov_len); > >>> + } else { > >>> + descs[i].len = cpu_to_le32(in_sg[n - out_num].iov_len); > >>> + } > >>> + > >>> + curr = cpu_to_le16(svq->desc_next[curr]); > >>> + > >>> + if (++i >= svq->vring_packed.vring.num) { > >>> + i = 0; > >>> + svq->vring_packed.avail_used_flags ^= > >>> + 1 << VRING_PACKED_DESC_F_AVAIL | > >>> + 1 << VRING_PACKED_DESC_F_USED; > >>> + } > >>> + } > >>> > >>> + if (i <= head_idx) { > >>> + svq->vring_packed.avail_wrap_counter ^= 1; > >>> + } > >>> + > >>> + svq->vring_packed.next_avail_idx = i; > >>> + svq->shadow_avail_idx = i; > >>> + svq->free_head = curr; > >>> + > >>> + /* > >>> + * A driver MUST NOT make the first descriptor in the list > >>> + * available before all subsequent descriptors comprising > >>> + * the list are made available. > >>> + */ > >>> + smp_wmb(); > >>> + svq->vring_packed.vring.desc[head_idx].flags = head_flags; > >>> } > >>> > >>> -static void vhost_svq_kick(VhostShadowVirtqueue *svq) > >>> +static void vhost_svq_kick_split(VhostShadowVirtqueue *svq) > >>> { > >>> bool needs_kick; > >>> > >>> @@ -209,7 +282,8 @@ static void vhost_svq_kick(VhostShadowVirtqueue *svq) > >>> if (virtio_vdev_has_feature(svq->vdev, VIRTIO_RING_F_EVENT_IDX)) { > >>> uint16_t avail_event = le16_to_cpu( > >>> *(uint16_t *)(&svq->vring.used->ring[svq->vring.num])); > >>> - needs_kick = vring_need_event(avail_event, > >>> svq->shadow_avail_idx, svq->shadow_avail_idx - 1); > >>> + needs_kick = vring_need_event(avail_event, svq->shadow_avail_idx, > >>> + svq->shadow_avail_idx - 1); > >>> } else { > >>> needs_kick = > >>> !(svq->vring.used->flags & > >>> cpu_to_le16(VRING_USED_F_NO_NOTIFY)); > >>> @@ -222,6 +296,30 @@ static void vhost_svq_kick(VhostShadowVirtqueue *svq) > >>> event_notifier_set(&svq->hdev_kick); > >>> } > >>> > >>> +static void vhost_svq_kick_packed(VhostShadowVirtqueue *svq) > >>> +{ > >>> + bool needs_kick; > >>> + > >>> + /* > >>> + * We need to expose the available array entries before checking > >>> + * notification suppressions. > >>> + */ > >>> + smp_mb(); > >>> + > >>> + if (virtio_vdev_has_feature(svq->vdev, VIRTIO_RING_F_EVENT_IDX)) { > >>> + return; > >>> + } else { > >>> + needs_kick = (svq->vring_packed.vring.device->flags != > >>> + cpu_to_le16(VRING_PACKED_EVENT_FLAG_DISABLE)); > >>> + } > >>> + > >>> + if (!needs_kick) { > >>> + return; > >>> + } > >>> + > >>> + event_notifier_set(&svq->hdev_kick); > >>> +} > >>> + > >>> /** > >>> * Add an element to a SVQ. > >>> * > >>> @@ -258,13 +356,22 @@ int vhost_svq_add(VhostShadowVirtqueue *svq, const > >>> struct iovec *out_sg, > >>> return -EINVAL; > >>> } > >>> > >>> - vhost_svq_add_split(svq, out_sg, out_num, in_sg, > >>> - in_num, sgs, &qemu_head); > >>> + if (svq->is_packed) { > >>> + vhost_svq_add_packed(svq, out_sg, out_num, in_sg, > >>> + in_num, sgs, &qemu_head); > >>> + } else { > >>> + vhost_svq_add_split(svq, out_sg, out_num, in_sg, > >>> + in_num, sgs, &qemu_head); > >>> + } > >>> > >>> svq->num_free -= ndescs; > >>> svq->desc_state[qemu_head].elem = elem; > >>> svq->desc_state[qemu_head].ndescs = ndescs; > >> > >> *head in vhost_svq_add_packed() is stored in "qemu_head" here. > >> > > > > Sorry I don't get this, can you expand? > > Sure. In vhost_svq_add(), after the descriptors have been added > (either using vhost_svq_add_split or vhost_svq_add_packed), > VirtQueueElement elem and ndescs are both saved in the > svq->desc_state array. "elem" and "ndescs" are later used when > the guest consumes used descriptors from the device in > vhost_svq_get_buf_(split|packed). > > For split vqs, the index of svq->desc where elem and ndescs are > saved matches the index of the descriptor ring where the head of > the descriptor ring is placed. > > In vhost_svq_add_split: > > *head = svq->free_head; > [...] > avail_idx = svq->shadow_avail_idx & (svq->vring.num - 1); > avail->ring[avail_idx] = cpu_to_le16(*head); > > "qemu_head" in vhost_svq_add gets its value from "*head" in > vhost_svq_add_split: > > svq->desc_state[qemu_head].elem = elem; > svq->desc_state[qemu_head].ndescs = ndescs; > > For packed vq, something similar has to be done. My approach was > to have the index of svq->desc_state match the buffer id in the > tail of the descriptor ring. > > The entire chain is written to the descriptor ring in the loop > in vhost_svq_add_packed. I am not sure if the index of > svq->desc_state should be the buffer id or if it should be a > descriptor index ("head_idx" or the index corresponding to the > tail of the chain). >
I think both approaches should be valid. My advice is to follow Linux's code and let it be the tail descriptor id. This descriptor id is pushed and popped from vq->free_head in a stack style. In addition to that, Linux also sets the same id to all the chain elements. I think this is useful when dealing with bad devices. In particular, QEMU's packed vq implementation looked at the first desciptor's id, which is an incorrect behavior.