On 9/20/22 10:43, Liu, Changpeng wrote:


-----Original Message-----
From: Maxime Coquelin <maxime.coque...@redhat.com>
Sent: Tuesday, September 20, 2022 4:13 PM
To: Liu, Changpeng <changpeng....@intel.com>; dev@dpdk.org
Cc: Xia, Chenbo <chenbo....@intel.com>
Subject: Re: [PATCH] vhost: use try_lock in rte_vhost_vring_call



On 9/20/22 09:45, Liu, Changpeng wrote:


-----Original Message-----
From: Maxime Coquelin <maxime.coque...@redhat.com>
Sent: Tuesday, September 20, 2022 3:35 PM
To: Liu, Changpeng <changpeng....@intel.com>; dev@dpdk.org
Cc: Xia, Chenbo <chenbo....@intel.com>
Subject: Re: [PATCH] vhost: use try_lock in rte_vhost_vring_call



On 9/20/22 09:29, Liu, Changpeng wrote:
Hi Maxime,

-----Original Message-----
From: Maxime Coquelin <maxime.coque...@redhat.com>
Sent: Tuesday, September 20, 2022 3:19 PM
To: Liu, Changpeng <changpeng....@intel.com>; dev@dpdk.org
Cc: Xia, Chenbo <chenbo....@intel.com>
Subject: Re: [PATCH] vhost: use try_lock in rte_vhost_vring_call



On 9/6/22 04:22, Changpeng Liu wrote:
Note that this function is in data path, so the thread context
may not same as socket messages processing context, by using
try_lock here, users can have another try in case of VQ's access
lock is held by `vhost-events` thread.

Signed-off-by: Changpeng Liu <changpeng....@intel.com>
---
     lib/vhost/vhost.c | 6 +++++-
     1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c
index 60cb05a0ff..072d2acb7b 100644
--- a/lib/vhost/vhost.c
+++ b/lib/vhost/vhost.c
@@ -1329,7 +1329,11 @@ rte_vhost_vring_call(int vid, uint16_t vring_idx)
        if (!vq)
                return -1;

-       rte_spinlock_lock(&vq->access_lock);
+       if (!rte_spinlock_trylock(&vq->access_lock)) {
+               VHOST_LOG_CONFIG(dev->ifname, DEBUG,
+                       "failed to kick guest, virtqueue busy.\n");
+               return -1;
+       }

        if (vq_is_packed(dev))
                vhost_vring_call_packed(dev, vq);

I think that's problematic, because it will break other applications
that currently rely on the API to block until the call is done.

Just some internal DPDK usage of this API:
./drivers/vdpa/ifc/ifcvf_vdpa.c:871:    rte_vhost_vring_call(internal->vid,
qid);
./examples/vhost/virtio_net.c:236:      rte_vhost_vring_call(dev->vid,
queue_id);
./examples/vhost/virtio_net.c:446:      rte_vhost_vring_call(dev->vid,
queue_id);
./examples/vhost_blk/vhost_blk.c:99:
rte_vhost_vring_call(task->ctrlr->vid, vq->id);
./examples/vhost_blk/vhost_blk.c:134:
rte_vhost_vring_call(task->ctrlr->vid, vq->id);

This change will break all the above uses.

And that's not counting external projects.

ou should better introduce a new API that does not block.
Could you add a new API to do this?
   >
I think we can use the new API in SPDK as a workaround, note that SPDK
project
is blocked for
a while which can't be used with DPDK 22.05 or newer.

DPDK v22.05?
What is the commit introducing the regression?
Here is the commit introducing this issue
c5736998305d ("vhost: fix missing virtqueue lock protection")
Bugzilla ID: 1015

Ok, it cannot be reverted, as it prevents some undefined
behaviors/crashes.


Note that if we introduce a new API, it won't be backported to stable
branches.
I understand, but do we have better idea in short time? we're planning
to release SPDK 22.09 recently.

You can have another thread that sends the call?
We already use two threads to do this. Here is the example for existing code in 
SPDK:

DPDK vhost-events thread                        SPDK thread

     SET_VRING_KICK VQ1       ---->            Start polling VQ1
     Reply to DPDK                    <----              Done
     SET_VRING_KICK VQ2       ---->            thread is blocked on VQ's access 
lock, SPDK thread can't provide reply message
For example, we can just return for SET_VRING_KICK VQ2 message without checking SPDK thread, but this leave
uncertain replies to VM.

I'm sorry but you will have to find a workaround while v22.11 is out and
you can consume it. We can neither backport new API nor we can break all
the other applications not handling locking failure.

Regarding the new API for v22.11, I should be named something like
rte_vhost_vring_call_nonblock(), and ideally should return some like
-EAGAIN instead of -1 o that the applications can distinguish between a
real failure and a need for retry.

Regards,
Maxime




Vhost-blk and scsi devices are not same with vhost-net, we need to cover
SeaBIOS and VM
cases, so we need to start processing vrings after 1 vring is ready.

Regards,
Maxime




Reply via email to