On Thu, Jun 26, 2025 at 7:16 AM Sahil Siddiq <icegambi...@gmail.com> wrote:
>
> Hi,
>
> It's been a while since I sent an email. I thought I would send an update
> to keep you in the loop.
>
> I have been comparing svq's mechanism for split and packed vqs hoping to
> find something that might lead to the source of the issue.
>
> One thing worth noting is that when I use kernel version 6.8.5 for testing,
> the crashes are far more frequent. In kernel version 6.15.0-rc3+, it's much
> harder to reproduce.
>
> On 5/15/25 11:49 AM, Eugenio Perez Martin wrote:
> > On Wed, May 14, 2025 at 8:22 AM Sahil Siddiq <icegambi...@gmail.com> wrote:
> >> On 4/16/25 12:50 PM, Eugenio Perez Martin wrote:
> >>> On Mon, Apr 14, 2025 at 11:20 AM Sahil Siddiq <icegambi...@gmail.com> 
> >>> wrote:
> >>>> On 3/26/25 1:05 PM, Eugenio Perez Martin wrote:
> >>>>> On Mon, Mar 24, 2025 at 2:59 PM Sahil Siddiq <icegambi...@gmail.com> 
> >>>>> wrote:
> >>>>>> I managed to fix a few issues while testing this patch series.
> >>>>>> There is still one issue that I am unable to resolve. I thought
> >>>>>> I would send this patch series for review in case I have missed
> >>>>>> something.
> >>>>>>
> >>>>>> The issue is that this patch series does not work every time. I
> >>>>>> am able to ping L0 from L2 and vice versa via packed SVQ when it
> >>>>>> works.
> >>>>>>
> >>>>>> When this doesn't work, both VMs throw a "Destination Host
> >>>>>> Unreachable" error. This is sometimes (not always) accompanied
> >>>>>> by the following kernel error (thrown by L2-kernel):
> >>>>>>
> >>>>>> virtio_net virtio1: output.0:id 1 is not a head!
> >>>>>>
> >>
> >> The TX queue seems to be problematic. More on this below.
>
> Sometimes RX also results in this crash, but it seems to be less frequent.
>
> >>>> This was filled while L2
> >>>> was booting. In the case when the ctrl vq is disabled, I am not sure
> >>>> what is responsible for filling the vqs in the data plane during
> >>>> booting.
> >>>>
> >>> The nested guest's driver fills the rx queue at startup. After that,
> >>> that nested guest kicks and SVQ receives the descriptors. It copies
> >>> the descriptors to the shadow virtqueue and then kicks L0 QEMU.
> >>
> >> Understood.
> >>
> >>>> The other vq (vq_idx=1) is not filled completely before the issue is
> >>>> hit.
> >>>> I have been noting down the numbers and here is an example:
> >>>>
> >>>> 295 descriptors were added individually to the queues i.e., there were 
> >>>> no chains (vhost_svq_add_packed)
> >>>> |_ 256 additions in vq_idx = 0, all with unique ids
> >>>>        |---- 27 descriptors (ids 0 through 26) were received later from 
> >>>> the device (vhost_svq_get_buf_packed)
> >>>> |_ 39 additions in vq_idx = 1
> >>>>        |_ 13 descriptors had id = 0
> >>>>        |_ 26 descriptors had id = 1
> >>>>        |---- All descriptors were received at some point from the device 
> >>>> (vhost_svq_get_buf_packed)
> >>>>
> >>>> There was one case in which vq_idx=0 had wrapped around. I verified
> >>>> that flags were set appropriately during the wrap (avail and used flags
> >>>> were flipped as expected).
> >>>>
> >>>
> >>> Ok sounds like you're able to reach it before filling the queue. I'd
> >>> go for debugging notifications for this one then. More on this below.
> >>>
> >>>> =====
> >>>> The next common situation where this issue is hit is during startup.
> >>>> Before L2 can finish booting successfully, this error is thrown:
> >>>>
> >>>> virtio_net virtio1: output.0:id 0 is not a head!
> >>>>
> >>>> 258 descriptors were added individually to the queues during startup 
> >>>> (there were no chains) (vhost_svq_add_packed)
> >>>> |_ 256 additions in vq_idx = 0, all with unique ids
> >>>>       |---- None of them were received by the device 
> >>>> (vhost_svq_get_buf_packed)
> >>>> |_ 2 additions in vq_idx = 1
> >>>>       |_ id = 0 in index 0
> >>>>       |_ id = 1 in index 1
> >>>>       |---- Both descriptors were received at some point during startup 
> >>>> from the device (vhost_svq_get_buf_packed)
> >>>>
> >>>> =====
> >>>> Another case is after several seconds of pinging L0 from L2.
> >>>>
> >>>> [   99.034114] virtio_net virtio1: output.0:id 0 is not a head!
> >>>>
> >>>
> >>> So the L2 guest sees a descriptor it has not made available
> >>> previously. This can be caused because SVQ returns the same descriptor
> >>> twice, or it doesn't fill the id or flags properly. It can also be
> >>> caused because we're not protecting the write ordering in the ring,
> >>> but I don't see anything obviously wrong by looking at the code.
> >>>
> >>>> 366 descriptors were added individually to the queues i.e., there were 
> >>>> no chains (vhost_svq_add_packed)
> >>>> |_ 289 additions in vq_idx = 0, wrap-around was observed with avail and 
> >>>> used flags inverted for 33 descriptors
> >>>> |   |---- 40 descriptors (ids 0 through 39) were received from the 
> >>>> device (vhost_svq_get_buf_packed)
> >>>> |_ 77 additions in vq_idx = 1
> >>>>        |_ 76 descriptors had id = 0
> >>>>        |_ 1 descriptor had id = 1
> >>>>        |---- all 77 descriptors were received at some point from the 
> >>>> device (vhost_svq_get_buf_packed)
> >>>>
> >>>>
> >>>> "id is not a head" is being thrown because vq->packed.desc_state[id].data
> >>>> doesn't exist for the corresponding id in Linux [1]. But QEMU seems to 
> >>>> have
> >>>> stored some data for this id via vhost_svq_add() [2]. Linux sets the 
> >>>> value
> >>>> of vq->packed.desc_state[id].data in its version of 
> >>>> virtqueue_add_packed() [3].
> >>>>
> >>>
> >>> Let's keep debugging further. Can you trace the ids that the L2 kernel
> >>> makes available, and then the ones that it uses? At the same time, can
> >>> you trace the ids that the svq sees in vhost_svq_get_buf and the ones
> >>> that flushes? This allows us to check the set of available descriptors
> >>> at any given time.
> >>>
> >> In the linux kernel, I am printing which descriptor is received in which
> >> queue in drivers/virtio/virtio_ring.c:virtqueue_get_buf_ctx_packed() [1].
> >> I see the following lines getting printed for the TX queue:
> >>
> >> [  192.101591] output.0 -> id: 0
> >> [  213.737417] output.0 -> id: 0
> >> [  213.738714] output.0 -> id: 1
> >> [  213.740093] output.0 -> id: 0
> >> [  213.741521] virtio_net virtio1: output.0:id 0 is not a head!
> >>
> >
> > I find it particular that it is the first descriptor with id 1. Do you
> > have any other descriptor with id 1 previously? Does it fail
> > consistently with id 1?
>
> Yes, the descriptor with id 1 was used previously in TX. It varies between
> test runs. It has failed with other ids as well during some test runs. In
> one test run, it failed with id 17. I think there's an off-by-one bug here.
> It crashes when it receives id 'x - 1' instead of 'x'.
> > You should have descriptors with id 1 and more in the rx queue and the
> > code should not be able to tell the difference, so it seems weird it
> > fails with tx. But who knows :).
> Oh, I thought it would be able to differentiate between them since it knows
> which vq->idx it's coming from.
>
> I think there's something off in the way "free_head", "last_used_idx" and
> "desc_next" values are calculated in vhost_svq_get_buf_packed() [1].
>
> In the latest test run, QEMU sent ids 0 through 28 to L2. L2 started receiving
> them in order till id 8. At this point it received id 7 again for some reason
> and then crashed.
>
> L2:
>
> [ 1641.129218] (prepare_packed) output.0 -> needs_kick: 1
> [ 1641.130621] (notify) output.0 -> function will return true
> [ 1641.132022] output.0 -> id: 0
> [ 1739.502358] input.0 -> id: 0
> [ 1739.503003] input.0 -> id: 1
> [ 1739.562024] input.0 -> id: 2
> [ 1739.578682] input.0 -> id: 3
> [ 1739.661913] input.0 -> id: 4
> [ 1739.828796] input.0 -> id: 5
> [ 1739.829789] input.0 -> id: 6
> [ 1740.078757] input.0 -> id: 7
> [ 1740.079749] input.0 -> id: 8
> [ 1740.080382] input.0 -> id: 7    <----Received 7 again
> [ 1740.081614] virtio_net virtio1: input.0:id 7 is not a head!
>
> QEMU logs (vhost_svq_get_buf_packed):
> ------
> size              : svq->vring.num
> len               : svq->vring_packed.vring.desc[last_used].len
> id                : svq->vring_packed.vring.desc[last_used].id
> num               : svq->desc_state[id].ndescs
> last_used_chain   : Result of vhost_svq_last_desc_of_chain(svq, num, id) [2]
> free_head         : svq->free_head
> last_used         : (last_used_idx & ~(1 << VRING_PACKED_EVENT_F_WRAP_CTR)) + 
> num
> used_wrap_counter : !!(last_used_idx & (1 << VRING_PACKED_EVENT_F_WRAP_CTR))
> ------
>
> size: 256, len: 102, id: 0, vq idx: 0
> id: 0, last_used_chain: 0, free_head: 0, vq idx: 0
> num: 1, free_head: 0, id: 0, last_used: 1, used_wrap_counter: 1, vq idx: 0
> ------
> size: 256, len: 74, id: 1, vq idx: 0
> id: 1, last_used_chain: 1, free_head: 0, vq idx: 0
> num: 1, free_head: 1, id: 1, last_used: 2, used_wrap_counter: 1, vq idx: 0
> ------
> size: 256, len: 102, id: 2, vq idx: 0
> id: 2, last_used_chain: 2, free_head: 1, vq idx: 0
> num: 1, free_head: 2, id: 2, last_used: 3, used_wrap_counter: 1, vq idx: 0
> ------
> size: 256, len: 82, id: 3, vq idx: 0
> id: 3, last_used_chain: 3, free_head: 2, vq idx: 0
> num: 1, free_head: 3, id: 3, last_used: 4, used_wrap_counter: 1, vq idx: 0
> ------
> size: 256, len: 74, id: 4, vq idx: 0
> id: 4, last_used_chain: 4, free_head: 3, vq idx: 0
> num: 1, free_head: 4, id: 4, last_used: 5, used_wrap_counter: 1, vq idx: 0
> ------
> size: 256, len: 82, id: 5, vq idx: 0
> id: 5, last_used_chain: 5, free_head: 4, vq idx: 0
> num: 1, free_head: 5, id: 5, last_used: 6, used_wrap_counter: 1, vq idx: 0
> ------
> size: 256, len: 104, id: 6, vq idx: 0
> id: 6, last_used_chain: 6, free_head: 5, vq idx: 0
> num: 1, free_head: 6, id: 6, last_used: 7, used_wrap_counter: 1, vq idx: 0
> ------
> size: 256, len: 82, id: 7, vq idx: 0
> id: 7, last_used_chain: 7, free_head: 6, vq idx: 0
> num: 1, free_head: 7, id: 7, last_used: 8, used_wrap_counter: 1, vq idx: 0
> ------
> size: 256, len: 104, id: 8, vq idx: 0
> id: 8, last_used_chain: 8, free_head: 7, vq idx: 0
> num: 1, free_head: 8, id: 8, last_used: 9, used_wrap_counter: 1, vq idx: 0
> ------
> size: 256, len: 98, id: 9, vq idx: 0
> id: 9, last_used_chain: 9, free_head: 8, vq idx: 0
> num: 1, free_head: 9, id: 9, last_used: 10, used_wrap_counter: 1, vq idx: 0
> ------
> size: 256, len: 104, id: 10, vq idx: 0
> id: 10, last_used_chain: 10, free_head: 9, vq idx: 0
> num: 1, free_head: 10, id: 10, last_used: 11, used_wrap_counter: 1, vq idx: 0
>
> I have a few more ideas of what to do. I'll let you know if I find something
> else.
>

I cannot find anything just by inspection. What about printing all the
desc_state and all desc_next to check for incoherencies in each
svq_add and get_buf?


Reply via email to