On Wed, May 14, 2025 at 8:22 AM Sahil Siddiq <icegambi...@gmail.com> wrote: > > Hi, > > Apologies, I haven't been in touch for a while. I have an update that > I would like to give. > > On 4/16/25 12:50 PM, Eugenio Perez Martin wrote: > > On Mon, Apr 14, 2025 at 11:20 AM Sahil Siddiq <icegambi...@gmail.com> wrote: > >> > >> Hi, > >> > >> On 3/26/25 1:05 PM, Eugenio Perez Martin wrote: > >>> On Mon, Mar 24, 2025 at 2:59 PM Sahil Siddiq <icegambi...@gmail.com> > >>> wrote: > >>>> I managed to fix a few issues while testing this patch series. > >>>> There is still one issue that I am unable to resolve. I thought > >>>> I would send this patch series for review in case I have missed > >>>> something. > >>>> > >>>> The issue is that this patch series does not work every time. I > >>>> am able to ping L0 from L2 and vice versa via packed SVQ when it > >>>> works. > >>> > >>> So we're on a very good track then! > >>> > >>>> When this doesn't work, both VMs throw a "Destination Host > >>>> Unreachable" error. This is sometimes (not always) accompanied > >>>> by the following kernel error (thrown by L2-kernel): > >>>> > >>>> virtio_net virtio1: output.0:id 1 is not a head! > >>>> > >>> > >>> How many packets have been sent or received before hitting this? If > >>> the answer to that is "the vq size", maybe there is a bug in the code > >>> that handles the wraparound of the packed vq, as the used and avail > >>> flags need to be twisted. You can count them in the SVQ code. > >> > >> I did a lot more testing. This issue is quite unpredictable in terms > >> of the time at which it appears after booting L2. So far, it almost > >> always appears after booting L2. Even when pinging works, this issue > >> appears after several seconds of pinging. > >> > > > > Maybe you can speed it up with ping -f? > > Thank you, I was able to run tests much faster with the -f option. So > far I have noticed that the RX queue does not give problems. When all > the descriptors are used it is able to wrap around without issues. > > >> The total number of svq descriptors varied in every test run. But in > >> every case, all 256 indices were filled in the descriptor region for > >> vq with vq_idx = 0. This is the RX vq, right? > > > > Right! > > The TX queue seems to be problematic. More on this below. > > >> This was filled while L2 > >> was booting. In the case when the ctrl vq is disabled, I am not sure > >> what is responsible for filling the vqs in the data plane during > >> booting. > >> > > The nested guest's driver fills the rx queue at startup. After that, > > that nested guest kicks and SVQ receives the descriptors. It copies > > the descriptors to the shadow virtqueue and then kicks L0 QEMU. > > Understood. > > >> ===== > >> The issue is hit most frequently when the following command is run > >> in L0: > >> $ ip addr add 111.1.1.1/24 dev tap0 > >> $ ip link set tap0 up > >> > >> or, running the following in L2: > >> # ip addr add 111.1.1.2/24 dev eth0 > >> > > > > I guess those are able to start the network, aren't they? > > Yes, that's correct. > > >> The other vq (vq_idx=1) is not filled completely before the issue is > >> hit. > >> I have been noting down the numbers and here is an example: > >> > >> 295 descriptors were added individually to the queues i.e., there were no > >> chains (vhost_svq_add_packed) > >> |_ 256 additions in vq_idx = 0, all with unique ids > >> |---- 27 descriptors (ids 0 through 26) were received later from the > >> device (vhost_svq_get_buf_packed) > >> |_ 39 additions in vq_idx = 1 > >> |_ 13 descriptors had id = 0 > >> |_ 26 descriptors had id = 1 > >> |---- All descriptors were received at some point from the device > >> (vhost_svq_get_buf_packed) > >> > >> There was one case in which vq_idx=0 had wrapped around. I verified > >> that flags were set appropriately during the wrap (avail and used flags > >> were flipped as expected). > >> > > > > Ok sounds like you're able to reach it before filling the queue. I'd > > go for debugging notifications for this one then. More on this below. > > > >> ===== > >> The next common situation where this issue is hit is during startup. > >> Before L2 can finish booting successfully, this error is thrown: > >> > >> virtio_net virtio1: output.0:id 0 is not a head! > >> > >> 258 descriptors were added individually to the queues during startup > >> (there were no chains) (vhost_svq_add_packed) > >> |_ 256 additions in vq_idx = 0, all with unique ids > >> |---- None of them were received by the device > >> (vhost_svq_get_buf_packed) > >> |_ 2 additions in vq_idx = 1 > >> |_ id = 0 in index 0 > >> |_ id = 1 in index 1 > >> |---- Both descriptors were received at some point during startup > >> from the device (vhost_svq_get_buf_packed) > >> > >> ===== > >> Another case is after several seconds of pinging L0 from L2. > >> > >> [ 99.034114] virtio_net virtio1: output.0:id 0 is not a head! > >> > > > > So the L2 guest sees a descriptor it has not made available > > previously. This can be caused because SVQ returns the same descriptor > > twice, or it doesn't fill the id or flags properly. It can also be > > caused because we're not protecting the write ordering in the ring, > > but I don't see anything obviously wrong by looking at the code. > > > >> 366 descriptors were added individually to the queues i.e., there were no > >> chains (vhost_svq_add_packed) > >> |_ 289 additions in vq_idx = 0, wrap-around was observed with avail and > >> used flags inverted for 33 descriptors > >> | |---- 40 descriptors (ids 0 through 39) were received from the device > >> (vhost_svq_get_buf_packed) > >> |_ 77 additions in vq_idx = 1 > >> |_ 76 descriptors had id = 0 > >> |_ 1 descriptor had id = 1 > >> |---- all 77 descriptors were received at some point from the device > >> (vhost_svq_get_buf_packed) > >> > >> I am not entirely sure now if there's an issue in the packed vq > >> implementation in QEMU or if this is being caused due to some sort > >> of race condition in linux. > >> > >> "id is not a head" is being thrown because vq->packed.desc_state[id].data > >> doesn't exist for the corresponding id in Linux [1]. But QEMU seems to have > >> stored some data for this id via vhost_svq_add() [2]. Linux sets the value > >> of vq->packed.desc_state[id].data in its version of virtqueue_add_packed() > >> [3]. > >> > > > > Let's keep debugging further. Can you trace the ids that the L2 kernel > > makes available, and then the ones that it uses? At the same time, can > > you trace the ids that the svq sees in vhost_svq_get_buf and the ones > > that flushes? This allows us to check the set of available descriptors > > at any given time. > > > In the linux kernel, I am printing which descriptor is received in which > queue in drivers/virtio/virtio_ring.c:virtqueue_get_buf_ctx_packed() [1]. > I see the following lines getting printed for the TX queue: > > [ 192.101591] output.0 -> id: 0 > [ 213.737417] output.0 -> id: 0 > [ 213.738714] output.0 -> id: 1 > [ 213.740093] output.0 -> id: 0 > [ 213.741521] virtio_net virtio1: output.0:id 0 is not a head! >
I find it particular that it is the first descriptor with id 1. Do you have any other descriptor with id 1 previously? Does it fail consistently with id 1? You should have descriptors with id 1 and more in the rx queue and the code should not be able to tell the difference, so it seems weird it fails with tx. But who knows :). > In QEMU's hw/virtio/vhost-shadow-virtqueue.c:vhost_svq_add_packed(), I am > printing the head_idx, id, len, flags and vq_idx. Just before the crash, > the following lines are printed: > > head_idx: 157, id: 0, len: 122, flags: 32768, vq idx: 1 > head_idx: 158, id: 0, len: 122, flags: 32768, vq idx: 1 > head_idx: 159, id: 0, len: 66, flags: 32768, vq idx: 1 > head_idx: 160, id: 1, len: 102, flags: 32768, vq idx: 1 > > In QEMU's hw/virtio/vhost-shadow-virtqueue.c:vhost_svq_get_buf_packed(), I > am printing the id, last_used index, used wrap counter and vq_idx. These > are the lines just before the crash: > > id: 0, last_used: 158, used_wrap_counter: 0, vq idx: 1 > id: 0, last_used: 159, used_wrap_counter: 0, vq idx: 1 > id: 0, last_used: 160, used_wrap_counter: 0, vq idx: 1 > id: 1, last_used: 161, used_wrap_counter: 0, vq idx: 1 > > In QEMU's hw/virtio/vhost-shadow-virtqueue.c:vhost_svq_flush() [2], I am > tracking > the values of i and vq_idx in the outer do..while() loop as well as in the > inner > while(true) loop. The value of i is used as the "idx" in virtqueue_fill() [3] > and > as "count" in virtqueue_flush() [4]. Lines printed in each iteration of the > outer > do...while loop are enclosed between "===" lines. These are the lines just > before > the crash: > I'd print VirtQueueElement members too. It seems you're super close to fix it :). Thanks! > === > in_loop: i: 0, vq idx: 1 > in_loop: i: 1, vq idx: 1 > out_loop: i: 1, vq idx: 1 > === > in_loop: i: 0, vq idx: 1 > in_loop: i: 1, vq idx: 1 > out_loop: i: 1, vq idx: 1 > === > in_loop: i: 0, vq idx: 1 > in_loop: i: 1, vq idx: 1 > in_loop: i: 2, vq idx: 1 > out_loop: i: 2, vq idx: 1 > === > in_loop: i: 0, vq idx: 1 > out_loop: i: 0, vq idx: 1 > > I have only investigated which descriptors the kernel uses. I'll also check > which descriptors are made available by the kernel. I'll let you know what I > find. > > Thanks, > Sahil > > [1] > https://github.com/torvalds/linux/blob/master/drivers/virtio/virtio_ring.c#L1727 > [2] > https://gitlab.com/qemu-project/qemu/-/blob/master/hw/virtio/vhost-shadow-virtqueue.c#L499 > [3] > https://gitlab.com/qemu-project/qemu/-/blob/master/hw/virtio/virtio.c#L1008 > [4] > https://gitlab.com/qemu-project/qemu/-/blob/master/hw/virtio/virtio.c#L1147 >