On Thu, Oct 14, 2021 at 7:38 PM Maxime Coquelin <maxime.coque...@redhat.com> wrote: > > > > On 10/14/21 13:25, Li Feng wrote: > > Thank you for your response. > > > > On Thu, Oct 14, 2021 at 4:17 PM Maxime Coquelin > > <maxime.coque...@redhat.com> wrote: > >> > >> Hi Li, > >> > >> Adding Jin Yu who introduced this function. > >> > >> On 8/27/21 07:12, Li Feng wrote: > >>> When getting reqs from the avail ring, the id may exceed inflight > >>> queue size. Then the dpdk will crash forever. > >> > >> You need to add Fixes tag and Cc sta...@dpdk.org so that it can be > >> backported. > > OK, I will send the v2 version. > > > >> > >>> Signed-off-by: Li Feng <fen...@smartx.com> > >>> --- > >>> lib/vhost/vhost_user.c | 10 ++++++++-- > >>> 1 file changed, 8 insertions(+), 2 deletions(-) > >>> > >>> diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c > >>> index 29a4c9af60..f09d0f6a48 100644 > >>> --- a/lib/vhost/vhost_user.c > >>> +++ b/lib/vhost/vhost_user.c > >>> @@ -1823,8 +1823,14 @@ vhost_check_queue_inflights_split(struct > >>> virtio_net *dev, > >>> last_io = inflight_split->last_inflight_io; > >>> > >>> if (inflight_split->used_idx != used->idx) { > >>> - inflight_split->desc[last_io].inflight = 0; > >>> - rte_atomic_thread_fence(__ATOMIC_SEQ_CST); > >>> + if (unlikely(last_io >= inflight_split->desc_num)) { > >>> + VHOST_LOG_CONFIG(ERR, "last_inflight_io '%"PRIu16"' > >>> exceeds inflight " > >>> + "queue size (%"PRIu16").\n", last_io, > >>> + inflight_split->desc_num); > >> > >> If such error happens, shouldn't we return RTE_VHOST_MSG_RESULT_ERR > >> instead of just logging an error? > > I think ignoring the error is ok. No one could handle this error correctly. > > At this time the guest virtio driver of this virtqueue may be in an > > incorrect state. > > Not sure to understand how it can happen. > But I see that last_io is actually vq->inflight_split->last_inflight_io, > which is set only by rte_vhost_set_last_inflight_io_split() API. The polluted value is from the frontend driver. My environment occurs this issue, and a VM is hang, so I guess this bad value comes from it.
> > Shouldn't there be a sanity check there to ensure that last_inflight_io > is smaller than desc_num value set by the frontend? Yes, putting a check in rte_vhost_set_last_inflight_io_split is also ok. I will send the v2 version that includes this. Thanks. > > Returning an error is the right thing to do anyway. OK. > > >> > >>> + } else { > >>> + inflight_split->desc[last_io].inflight = 0; > >>> + rte_atomic_thread_fence(__ATOMIC_SEQ_CST); > >>> + } > >>> inflight_split->used_idx = used->idx; > >>> } > >>> > >>> > >> > >> Regards, > >> Maxime > >> > > >