On Thu, Jun 26, 2025 at 7:16 AM Sahil Siddiq <icegambi...@gmail.com> wrote: > > Hi, > > It's been a while since I sent an email. I thought I would send an update > to keep you in the loop. > > I have been comparing svq's mechanism for split and packed vqs hoping to > find something that might lead to the source of the issue. > > One thing worth noting is that when I use kernel version 6.8.5 for testing, > the crashes are far more frequent. In kernel version 6.15.0-rc3+, it's much > harder to reproduce. > > On 5/15/25 11:49 AM, Eugenio Perez Martin wrote: > > On Wed, May 14, 2025 at 8:22 AM Sahil Siddiq <icegambi...@gmail.com> wrote: > >> On 4/16/25 12:50 PM, Eugenio Perez Martin wrote: > >>> On Mon, Apr 14, 2025 at 11:20 AM Sahil Siddiq <icegambi...@gmail.com> > >>> wrote: > >>>> On 3/26/25 1:05 PM, Eugenio Perez Martin wrote: > >>>>> On Mon, Mar 24, 2025 at 2:59 PM Sahil Siddiq <icegambi...@gmail.com> > >>>>> wrote: > >>>>>> I managed to fix a few issues while testing this patch series. > >>>>>> There is still one issue that I am unable to resolve. I thought > >>>>>> I would send this patch series for review in case I have missed > >>>>>> something. > >>>>>> > >>>>>> The issue is that this patch series does not work every time. I > >>>>>> am able to ping L0 from L2 and vice versa via packed SVQ when it > >>>>>> works. > >>>>>> > >>>>>> When this doesn't work, both VMs throw a "Destination Host > >>>>>> Unreachable" error. This is sometimes (not always) accompanied > >>>>>> by the following kernel error (thrown by L2-kernel): > >>>>>> > >>>>>> virtio_net virtio1: output.0:id 1 is not a head! > >>>>>> > >> > >> The TX queue seems to be problematic. More on this below. > > Sometimes RX also results in this crash, but it seems to be less frequent. > > >>>> This was filled while L2 > >>>> was booting. In the case when the ctrl vq is disabled, I am not sure > >>>> what is responsible for filling the vqs in the data plane during > >>>> booting. > >>>> > >>> The nested guest's driver fills the rx queue at startup. After that, > >>> that nested guest kicks and SVQ receives the descriptors. It copies > >>> the descriptors to the shadow virtqueue and then kicks L0 QEMU. > >> > >> Understood. > >> > >>>> The other vq (vq_idx=1) is not filled completely before the issue is > >>>> hit. > >>>> I have been noting down the numbers and here is an example: > >>>> > >>>> 295 descriptors were added individually to the queues i.e., there were > >>>> no chains (vhost_svq_add_packed) > >>>> |_ 256 additions in vq_idx = 0, all with unique ids > >>>> |---- 27 descriptors (ids 0 through 26) were received later from > >>>> the device (vhost_svq_get_buf_packed) > >>>> |_ 39 additions in vq_idx = 1 > >>>> |_ 13 descriptors had id = 0 > >>>> |_ 26 descriptors had id = 1 > >>>> |---- All descriptors were received at some point from the device > >>>> (vhost_svq_get_buf_packed) > >>>> > >>>> There was one case in which vq_idx=0 had wrapped around. I verified > >>>> that flags were set appropriately during the wrap (avail and used flags > >>>> were flipped as expected). > >>>> > >>> > >>> Ok sounds like you're able to reach it before filling the queue. I'd > >>> go for debugging notifications for this one then. More on this below. > >>> > >>>> ===== > >>>> The next common situation where this issue is hit is during startup. > >>>> Before L2 can finish booting successfully, this error is thrown: > >>>> > >>>> virtio_net virtio1: output.0:id 0 is not a head! > >>>> > >>>> 258 descriptors were added individually to the queues during startup > >>>> (there were no chains) (vhost_svq_add_packed) > >>>> |_ 256 additions in vq_idx = 0, all with unique ids > >>>> |---- None of them were received by the device > >>>> (vhost_svq_get_buf_packed) > >>>> |_ 2 additions in vq_idx = 1 > >>>> |_ id = 0 in index 0 > >>>> |_ id = 1 in index 1 > >>>> |---- Both descriptors were received at some point during startup > >>>> from the device (vhost_svq_get_buf_packed) > >>>> > >>>> ===== > >>>> Another case is after several seconds of pinging L0 from L2. > >>>> > >>>> [ 99.034114] virtio_net virtio1: output.0:id 0 is not a head! > >>>> > >>> > >>> So the L2 guest sees a descriptor it has not made available > >>> previously. This can be caused because SVQ returns the same descriptor > >>> twice, or it doesn't fill the id or flags properly. It can also be > >>> caused because we're not protecting the write ordering in the ring, > >>> but I don't see anything obviously wrong by looking at the code. > >>> > >>>> 366 descriptors were added individually to the queues i.e., there were > >>>> no chains (vhost_svq_add_packed) > >>>> |_ 289 additions in vq_idx = 0, wrap-around was observed with avail and > >>>> used flags inverted for 33 descriptors > >>>> | |---- 40 descriptors (ids 0 through 39) were received from the > >>>> device (vhost_svq_get_buf_packed) > >>>> |_ 77 additions in vq_idx = 1 > >>>> |_ 76 descriptors had id = 0 > >>>> |_ 1 descriptor had id = 1 > >>>> |---- all 77 descriptors were received at some point from the > >>>> device (vhost_svq_get_buf_packed) > >>>> > >>>> > >>>> "id is not a head" is being thrown because vq->packed.desc_state[id].data > >>>> doesn't exist for the corresponding id in Linux [1]. But QEMU seems to > >>>> have > >>>> stored some data for this id via vhost_svq_add() [2]. Linux sets the > >>>> value > >>>> of vq->packed.desc_state[id].data in its version of > >>>> virtqueue_add_packed() [3]. > >>>> > >>> > >>> Let's keep debugging further. Can you trace the ids that the L2 kernel > >>> makes available, and then the ones that it uses? At the same time, can > >>> you trace the ids that the svq sees in vhost_svq_get_buf and the ones > >>> that flushes? This allows us to check the set of available descriptors > >>> at any given time. > >>> > >> In the linux kernel, I am printing which descriptor is received in which > >> queue in drivers/virtio/virtio_ring.c:virtqueue_get_buf_ctx_packed() [1]. > >> I see the following lines getting printed for the TX queue: > >> > >> [ 192.101591] output.0 -> id: 0 > >> [ 213.737417] output.0 -> id: 0 > >> [ 213.738714] output.0 -> id: 1 > >> [ 213.740093] output.0 -> id: 0 > >> [ 213.741521] virtio_net virtio1: output.0:id 0 is not a head! > >> > > > > I find it particular that it is the first descriptor with id 1. Do you > > have any other descriptor with id 1 previously? Does it fail > > consistently with id 1? > > Yes, the descriptor with id 1 was used previously in TX. It varies between > test runs. It has failed with other ids as well during some test runs. In > one test run, it failed with id 17. I think there's an off-by-one bug here. > It crashes when it receives id 'x - 1' instead of 'x'. > > You should have descriptors with id 1 and more in the rx queue and the > > code should not be able to tell the difference, so it seems weird it > > fails with tx. But who knows :). > Oh, I thought it would be able to differentiate between them since it knows > which vq->idx it's coming from. > > I think there's something off in the way "free_head", "last_used_idx" and > "desc_next" values are calculated in vhost_svq_get_buf_packed() [1]. > > In the latest test run, QEMU sent ids 0 through 28 to L2. L2 started receiving > them in order till id 8. At this point it received id 7 again for some reason > and then crashed. > > L2: > > [ 1641.129218] (prepare_packed) output.0 -> needs_kick: 1 > [ 1641.130621] (notify) output.0 -> function will return true > [ 1641.132022] output.0 -> id: 0 > [ 1739.502358] input.0 -> id: 0 > [ 1739.503003] input.0 -> id: 1 > [ 1739.562024] input.0 -> id: 2 > [ 1739.578682] input.0 -> id: 3 > [ 1739.661913] input.0 -> id: 4 > [ 1739.828796] input.0 -> id: 5 > [ 1739.829789] input.0 -> id: 6 > [ 1740.078757] input.0 -> id: 7 > [ 1740.079749] input.0 -> id: 8 > [ 1740.080382] input.0 -> id: 7 <----Received 7 again > [ 1740.081614] virtio_net virtio1: input.0:id 7 is not a head! > > QEMU logs (vhost_svq_get_buf_packed): > ------ > size : svq->vring.num > len : svq->vring_packed.vring.desc[last_used].len > id : svq->vring_packed.vring.desc[last_used].id > num : svq->desc_state[id].ndescs > last_used_chain : Result of vhost_svq_last_desc_of_chain(svq, num, id) [2] > free_head : svq->free_head > last_used : (last_used_idx & ~(1 << VRING_PACKED_EVENT_F_WRAP_CTR)) + > num > used_wrap_counter : !!(last_used_idx & (1 << VRING_PACKED_EVENT_F_WRAP_CTR)) > ------ > > size: 256, len: 102, id: 0, vq idx: 0 > id: 0, last_used_chain: 0, free_head: 0, vq idx: 0 > num: 1, free_head: 0, id: 0, last_used: 1, used_wrap_counter: 1, vq idx: 0 > ------ > size: 256, len: 74, id: 1, vq idx: 0 > id: 1, last_used_chain: 1, free_head: 0, vq idx: 0 > num: 1, free_head: 1, id: 1, last_used: 2, used_wrap_counter: 1, vq idx: 0 > ------ > size: 256, len: 102, id: 2, vq idx: 0 > id: 2, last_used_chain: 2, free_head: 1, vq idx: 0 > num: 1, free_head: 2, id: 2, last_used: 3, used_wrap_counter: 1, vq idx: 0 > ------ > size: 256, len: 82, id: 3, vq idx: 0 > id: 3, last_used_chain: 3, free_head: 2, vq idx: 0 > num: 1, free_head: 3, id: 3, last_used: 4, used_wrap_counter: 1, vq idx: 0 > ------ > size: 256, len: 74, id: 4, vq idx: 0 > id: 4, last_used_chain: 4, free_head: 3, vq idx: 0 > num: 1, free_head: 4, id: 4, last_used: 5, used_wrap_counter: 1, vq idx: 0 > ------ > size: 256, len: 82, id: 5, vq idx: 0 > id: 5, last_used_chain: 5, free_head: 4, vq idx: 0 > num: 1, free_head: 5, id: 5, last_used: 6, used_wrap_counter: 1, vq idx: 0 > ------ > size: 256, len: 104, id: 6, vq idx: 0 > id: 6, last_used_chain: 6, free_head: 5, vq idx: 0 > num: 1, free_head: 6, id: 6, last_used: 7, used_wrap_counter: 1, vq idx: 0 > ------ > size: 256, len: 82, id: 7, vq idx: 0 > id: 7, last_used_chain: 7, free_head: 6, vq idx: 0 > num: 1, free_head: 7, id: 7, last_used: 8, used_wrap_counter: 1, vq idx: 0 > ------ > size: 256, len: 104, id: 8, vq idx: 0 > id: 8, last_used_chain: 8, free_head: 7, vq idx: 0 > num: 1, free_head: 8, id: 8, last_used: 9, used_wrap_counter: 1, vq idx: 0 > ------ > size: 256, len: 98, id: 9, vq idx: 0 > id: 9, last_used_chain: 9, free_head: 8, vq idx: 0 > num: 1, free_head: 9, id: 9, last_used: 10, used_wrap_counter: 1, vq idx: 0 > ------ > size: 256, len: 104, id: 10, vq idx: 0 > id: 10, last_used_chain: 10, free_head: 9, vq idx: 0 > num: 1, free_head: 10, id: 10, last_used: 11, used_wrap_counter: 1, vq idx: 0 > > I have a few more ideas of what to do. I'll let you know if I find something > else. >
I cannot find anything just by inspection. What about printing all the desc_state and all desc_next to check for incoherencies in each svq_add and get_buf?