Re: [RFC v5 0/7] Add packed format to shadow virtqueue

Eugenio Perez Martin Wed, 14 May 2025 23:21:33 -0700

On Wed, May 14, 2025 at 8:22 AM Sahil Siddiq <icegambi...@gmail.com> wrote:
>
> Hi,
>
> Apologies, I haven't been in touch for a while. I have an update that
> I would like to give.
>
> On 4/16/25 12:50 PM, Eugenio Perez Martin wrote:
> > On Mon, Apr 14, 2025 at 11:20 AM Sahil Siddiq <icegambi...@gmail.com> wrote:
> >>
> >> Hi,
> >>
> >> On 3/26/25 1:05 PM, Eugenio Perez Martin wrote:
> >>> On Mon, Mar 24, 2025 at 2:59 PM Sahil Siddiq <icegambi...@gmail.com> 
> >>> wrote:
> >>>> I managed to fix a few issues while testing this patch series.
> >>>> There is still one issue that I am unable to resolve. I thought
> >>>> I would send this patch series for review in case I have missed
> >>>> something.
> >>>>
> >>>> The issue is that this patch series does not work every time. I
> >>>> am able to ping L0 from L2 and vice versa via packed SVQ when it
> >>>> works.
> >>>
> >>> So we're on a very good track then!
> >>>
> >>>> When this doesn't work, both VMs throw a "Destination Host
> >>>> Unreachable" error. This is sometimes (not always) accompanied
> >>>> by the following kernel error (thrown by L2-kernel):
> >>>>
> >>>> virtio_net virtio1: output.0:id 1 is not a head!
> >>>>
> >>>
> >>> How many packets have been sent or received before hitting this? If
> >>> the answer to that is "the vq size", maybe there is a bug in the code
> >>> that handles the wraparound of the packed vq, as the used and avail
> >>> flags need to be twisted. You can count them in the SVQ code.
> >>
> >> I did a lot more testing. This issue is quite unpredictable in terms
> >> of the time at which it appears after booting L2. So far, it almost
> >> always appears after booting L2. Even when pinging works, this issue
> >> appears after several seconds of pinging.
> >>
> >
> > Maybe you can speed it up with ping -f?
>
> Thank you, I was able to run tests much faster with the -f option. So
> far I have noticed that the RX queue does not give problems. When all
> the descriptors are used it is able to wrap around without issues.
>
> >> The total number of svq descriptors varied in every test run. But in
> >> every case, all 256 indices were filled in the descriptor region for
> >> vq with vq_idx = 0. This is the RX vq, right?
> >
> > Right!
>
> The TX queue seems to be problematic. More on this below.
>
> >> This was filled while L2
> >> was booting. In the case when the ctrl vq is disabled, I am not sure
> >> what is responsible for filling the vqs in the data plane during
> >> booting.
> >>
> > The nested guest's driver fills the rx queue at startup. After that,
> > that nested guest kicks and SVQ receives the descriptors. It copies
> > the descriptors to the shadow virtqueue and then kicks L0 QEMU.
>
> Understood.
>
> >> =====
> >> The issue is hit most frequently when the following command is run
> >> in L0:
> >> $ ip addr add 111.1.1.1/24 dev tap0
> >> $ ip link set tap0 up
> >>
> >> or, running the following in L2:
> >> # ip addr add 111.1.1.2/24 dev eth0
> >>
> >
> > I guess those are able to start the network, aren't they?
>
> Yes, that's correct.
>
> >> The other vq (vq_idx=1) is not filled completely before the issue is
> >> hit.
> >> I have been noting down the numbers and here is an example:
> >>
> >> 295 descriptors were added individually to the queues i.e., there were no 
> >> chains (vhost_svq_add_packed)
> >> |_ 256 additions in vq_idx = 0, all with unique ids
> >>       |---- 27 descriptors (ids 0 through 26) were received later from the 
> >> device (vhost_svq_get_buf_packed)
> >> |_ 39 additions in vq_idx = 1
> >>       |_ 13 descriptors had id = 0
> >>       |_ 26 descriptors had id = 1
> >>       |---- All descriptors were received at some point from the device 
> >> (vhost_svq_get_buf_packed)
> >>
> >> There was one case in which vq_idx=0 had wrapped around. I verified
> >> that flags were set appropriately during the wrap (avail and used flags
> >> were flipped as expected).
> >>
> >
> > Ok sounds like you're able to reach it before filling the queue. I'd
> > go for debugging notifications for this one then. More on this below.
> >
> >> =====
> >> The next common situation where this issue is hit is during startup.
> >> Before L2 can finish booting successfully, this error is thrown:
> >>
> >> virtio_net virtio1: output.0:id 0 is not a head!
> >>
> >> 258 descriptors were added individually to the queues during startup 
> >> (there were no chains) (vhost_svq_add_packed)
> >> |_ 256 additions in vq_idx = 0, all with unique ids
> >>      |---- None of them were received by the device 
> >> (vhost_svq_get_buf_packed)
> >> |_ 2 additions in vq_idx = 1
> >>      |_ id = 0 in index 0
> >>      |_ id = 1 in index 1
> >>      |---- Both descriptors were received at some point during startup 
> >> from the device (vhost_svq_get_buf_packed)
> >>
> >> =====
> >> Another case is after several seconds of pinging L0 from L2.
> >>
> >> [   99.034114] virtio_net virtio1: output.0:id 0 is not a head!
> >>
> >
> > So the L2 guest sees a descriptor it has not made available
> > previously. This can be caused because SVQ returns the same descriptor
> > twice, or it doesn't fill the id or flags properly. It can also be
> > caused because we're not protecting the write ordering in the ring,
> > but I don't see anything obviously wrong by looking at the code.
> >
> >> 366 descriptors were added individually to the queues i.e., there were no 
> >> chains (vhost_svq_add_packed)
> >> |_ 289 additions in vq_idx = 0, wrap-around was observed with avail and 
> >> used flags inverted for 33 descriptors
> >> |   |---- 40 descriptors (ids 0 through 39) were received from the device 
> >> (vhost_svq_get_buf_packed)
> >> |_ 77 additions in vq_idx = 1
> >>       |_ 76 descriptors had id = 0
> >>       |_ 1 descriptor had id = 1
> >>       |---- all 77 descriptors were received at some point from the device 
> >> (vhost_svq_get_buf_packed)
> >>
> >> I am not entirely sure now if there's an issue in the packed vq
> >> implementation in QEMU or if this is being caused due to some sort
> >> of race condition in linux.
> >>
> >> "id is not a head" is being thrown because vq->packed.desc_state[id].data
> >> doesn't exist for the corresponding id in Linux [1]. But QEMU seems to have
> >> stored some data for this id via vhost_svq_add() [2]. Linux sets the value
> >> of vq->packed.desc_state[id].data in its version of virtqueue_add_packed() 
> >> [3].
> >>
> >
> > Let's keep debugging further. Can you trace the ids that the L2 kernel
> > makes available, and then the ones that it uses? At the same time, can
> > you trace the ids that the svq sees in vhost_svq_get_buf and the ones
> > that flushes? This allows us to check the set of available descriptors
> > at any given time.
> >
> In the linux kernel, I am printing which descriptor is received in which
> queue in drivers/virtio/virtio_ring.c:virtqueue_get_buf_ctx_packed() [1].
> I see the following lines getting printed for the TX queue:
>
> [  192.101591] output.0 -> id: 0
> [  213.737417] output.0 -> id: 0
> [  213.738714] output.0 -> id: 1
> [  213.740093] output.0 -> id: 0
> [  213.741521] virtio_net virtio1: output.0:id 0 is not a head!
>


I find it particular that it is the first descriptor with id 1. Do you
have any other descriptor with id 1 previously? Does it fail
consistently with id 1?

You should have descriptors with id 1 and more in the rx queue and the
code should not be able to tell the difference, so it seems weird it
fails with tx. But who knows :).

> In QEMU's hw/virtio/vhost-shadow-virtqueue.c:vhost_svq_add_packed(), I am
> printing the head_idx, id, len, flags and vq_idx. Just before the crash,
> the following lines are printed:
>
> head_idx: 157, id: 0, len: 122, flags: 32768, vq idx: 1
> head_idx: 158, id: 0, len: 122, flags: 32768, vq idx: 1
> head_idx: 159, id: 0, len: 66, flags: 32768, vq idx: 1
> head_idx: 160, id: 1, len: 102, flags: 32768, vq idx: 1
>
> In QEMU's hw/virtio/vhost-shadow-virtqueue.c:vhost_svq_get_buf_packed(), I
> am printing the id, last_used index, used wrap counter and vq_idx. These
> are the lines just before the crash:
>
> id: 0, last_used: 158, used_wrap_counter: 0, vq idx: 1
> id: 0, last_used: 159, used_wrap_counter: 0, vq idx: 1
> id: 0, last_used: 160, used_wrap_counter: 0, vq idx: 1
> id: 1, last_used: 161, used_wrap_counter: 0, vq idx: 1
>
> In QEMU's hw/virtio/vhost-shadow-virtqueue.c:vhost_svq_flush() [2], I am 
> tracking
> the values of i and vq_idx in the outer do..while() loop as well as in the 
> inner
> while(true) loop. The value of i is used as the "idx" in virtqueue_fill() [3] 
> and
> as "count" in virtqueue_flush() [4]. Lines printed in each iteration of the 
> outer
> do...while loop are enclosed between "===" lines. These are the lines just 
> before
> the crash:
>

I'd print VirtQueueElement members too.

It seems you're super close to fix it :).

Thanks!

> ===
> in_loop: i: 0, vq idx: 1
> in_loop: i: 1, vq idx: 1
> out_loop: i: 1, vq idx: 1
> ===
> in_loop: i: 0, vq idx: 1
> in_loop: i: 1, vq idx: 1
> out_loop: i: 1, vq idx: 1
> ===
> in_loop: i: 0, vq idx: 1
> in_loop: i: 1, vq idx: 1
> in_loop: i: 2, vq idx: 1
> out_loop: i: 2, vq idx: 1
> ===
> in_loop: i: 0, vq idx: 1
> out_loop: i: 0, vq idx: 1
>
> I have only investigated which descriptors the kernel uses. I'll also check
> which descriptors are made available by the kernel. I'll let you know what I
> find.
>
> Thanks,
> Sahil
>
> [1] 
> https://github.com/torvalds/linux/blob/master/drivers/virtio/virtio_ring.c#L1727
> [2] 
> https://gitlab.com/qemu-project/qemu/-/blob/master/hw/virtio/vhost-shadow-virtqueue.c#L499
> [3] 
> https://gitlab.com/qemu-project/qemu/-/blob/master/hw/virtio/virtio.c#L1008
> [4] 
> https://gitlab.com/qemu-project/qemu/-/blob/master/hw/virtio/virtio.c#L1147
>

Re: [RFC v5 0/7] Add packed format to shadow virtqueue

Reply via email to