On 7/29/20 3:24 PM, Xia, Chenbo wrote:
> Hi Maxime,
>
>> -----Original Message-----
>> From: Maxime Coquelin <maxime.coque...@redhat.com>
>> Sent: Wednesday, July 29, 2020 8:54 PM
>> To: dev@dpdk.org; ma...@mellanox.com; Xia, Chenbo
>> <chenbo....@intel.com>; Liu, Yong <yong....@intel.com>; Wang, Yinan
>> <yinan.w...@intel.com>
>> Cc: tho...@monjalon.net; Yigit, Ferruh <ferruh.yi...@intel.com>;
>> david.march...@redhat.com
>> Subject: Re: [PATCH v3 3/3] net/vhost: fix interrupt mode
>>
>>
>>
>> On 7/29/20 11:20 AM, Maxime Coquelin wrote:
>>> At .new_device() time, only the first vring pair is now ready, other
>>> vrings are consfigured later.
>>>
>>> Problem is that when application will setup and enable interrupts,
>>> only the first queue pair Rx interrupt will be enabled.
>>>
>>> This patches fixes the issue by setting the number of max interrupts
>>> to the number of Rx queues that will be later initialized. Then, as
>>> soon as a Rx vring is ready and interrupt enabled by the application,
>>> it removes the corresponding uninitialized epoll event, and install a
>>> new one with the valid FD.
>>>
>>> Fixes: 604052ae5395 ("net/vhost: support queue update")
>>>
>>> Signed-off-by: Maxime Coquelin <maxime.coque...@redhat.com>
>>> ---
>>> drivers/net/vhost/rte_eth_vhost.c | 75
>>> +++++++++++++++++++++++++++----
>>> 1 file changed, 66 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/drivers/net/vhost/rte_eth_vhost.c
>>> b/drivers/net/vhost/rte_eth_vhost.c
>>> index 951929c663..237785dd66 100644
>>> --- a/drivers/net/vhost/rte_eth_vhost.c
>>> +++ b/drivers/net/vhost/rte_eth_vhost.c
>>> @@ -5,6 +5,7 @@
>>> #include <unistd.h>
>>> #include <pthread.h>
>>> #include <stdbool.h>
>>> +#include <sys/epoll.h>
>>>
>>> #include <rte_mbuf.h>
>>> #include <rte_ethdev_driver.h>
>>> @@ -95,6 +96,8 @@ struct vhost_queue {
>>> uint16_t port;
>>> uint16_t virtqueue_id;
>>> struct vhost_stats stats;
>>> + int intr_enable;
>>> + rte_spinlock_t intr_lock;
>>> };
>>>
>>> struct pmd_internal {
>>> @@ -524,6 +527,45 @@ find_internal_resource(char *ifname)
>>> return list;
>>> }
>>>
>>> +static int
>>> +eth_vhost_update_intr(struct rte_eth_dev *eth_dev, uint16_t rxq_idx)
>>> +{
>>> + struct rte_intr_handle *handle = eth_dev->intr_handle;
>>> + struct rte_epoll_event rev;
>>> + int epfd, ret;
>>> +
>>
>> Chenbo reported that we can have a NULL pointer dereference on handle when
>> using Virtio-user on the other end and quitting.
>
> To clarity the root cause, it's because 'destroy_device' calls
> 'eth_vhost_uninstall_intr'
> when connection lost but l3fwd-power APP are still enabling intr.
Agree, l3fwd-power will need some rework, as even with fixing this, we
get flooded with error logs because the app tried to enable the intr in
loop once the frontend has disconnected. But this is also reproducible
in v20.05.
> Thanks,
> Chenbo
>
>>
>>
>>
>>> + if (handle->efds[rxq_idx] == handle->elist[rxq_idx].fd)
>>> + return 0;
>>> +
>>> + VHOST_LOG(INFO, "kickfd for rxq-%d was changed, updating
>> handler.\n",
>>> + rxq_idx);
>>> +
>>> + /*
>>> + * First remove invalid epoll event, and then isntall
>>> + * the new one. May be solved with a proper API in the
>>> + * future.
>>> + */
>>> + epfd = handle->elist[rxq_idx].epfd;
>>> + rev = handle->elist[rxq_idx];
>>> + ret = rte_epoll_ctl(epfd, EPOLL_CTL_DEL, rev.fd,
>>> + &handle->elist[rxq_idx]);
>>> + if (ret) {
>>> + VHOST_LOG(ERR, "Delete epoll event failed.\n");
>>> + return ret;
>>> + }
>>> +
>>> + rev.fd = handle->efds[rxq_idx];
>>> + handle->elist[rxq_idx] = rev;
>>> + ret = rte_epoll_ctl(epfd, EPOLL_CTL_ADD, rev.fd,
>>> + &handle->elist[rxq_idx]);
>>> + if (ret) {
>>> + VHOST_LOG(ERR, "Add epoll event failed.\n");
>>> + return ret;
>>> + }
>>> +
>>> + return 0;
>>> +}
>>> +
>>> static int
>>> eth_rxq_intr_enable(struct rte_eth_dev *dev, uint16_t qid) { @@
>>> -537,6 +579,11 @@ eth_rxq_intr_enable(struct rte_eth_dev *dev, uint16_t qid)
>>> return -1;
>>> }
>>>
>>> + rte_spinlock_lock(&vq->intr_lock);
>>> + vq->intr_enable = 1;
>>> + ret = eth_vhost_update_intr(dev, qid);
>>> + rte_spinlock_unlock(&vq->intr_lock);
>>> +
>>
>> I missed to check ret value here, will add it in v4.
>>
>>> ret = rte_vhost_get_vhost_vring(vq->vid, (qid << 1) + 1, &vring);
>>> if (ret < 0) {
>>> VHOST_LOG(ERR, "Failed to get rxq%d's vring\n", qid);
>