> -----Original Message----- > From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Ilya > Maximets > Sent: Thursday, June 2, 2016 6:24 AM > To: Daniele Di Proietto <diproiet...@vmware.com>; Traynor, Kevin > <kevin.tray...@intel.com>; dev@openvswitch.org > Cc: Dyasly Sergey <s.dya...@samsung.com>; Flavio Leitner > <f...@sysclose.org> > Subject: Re: [ovs-dev] [PATCH] netdev-dpdk: Fix PMD threads hang in > __netdev_dpdk_vhost_send(). > > On 02.06.2016 04:32, Daniele Di Proietto wrote: > > > > On 25/05/2016 04:03, "Ilya Maximets" <i.maxim...@samsung.com> wrote: > > > >> On 23.05.2016 17:55, Traynor, Kevin wrote: > >>>> -----Original Message----- > >>>> From: Ilya Maximets [mailto:i.maxim...@samsung.com] > >>>> Sent: Tuesday, May 17, 2016 4:09 PM > >>>> To: dev@openvswitch.org; Daniele Di Proietto > <diproiet...@vmware.com> > >>>> Cc: Dyasly Sergey <s.dya...@samsung.com>; Heetae Ahn > >>>> <heetae82....@samsung.com>; Flavio Leitner <f...@sysclose.org>; > >>>> Traynor, Kevin <kevin.tray...@intel.com>; Pravin B Shelar > >>>> <pshe...@ovn.org>; Ilya Maximets <i.maxim...@samsung.com> > >>>> Subject: [PATCH] netdev-dpdk: Fix PMD threads hang in > >>>> __netdev_dpdk_vhost_send(). > >>>> > >>>> There are situations when PMD thread can hang forever inside > >>>> __netdev_dpdk_vhost_send() because of broken virtqueue ring. > >>>> > >>>> This happens if rte_vring_available_entries() always positive and > >>>> rte_vhost_enqueue_burst() can't send anything (possible with > broken > >>>> ring). > >>>> > >>>> In this case time expiration will be never checked and 'do {} > while > >>>> (cnt)' > >>>> loop will be infinite. > >>>> > >>>> This scenario sometimes reproducible with dpdk-16.04-rc2 inside > guest > >>>> VM. > >>>> Also it may be reproduced by manual braking of ring structure > inside > >>>> the guest VM. > >>>> > >>>> Fix that by checking time expiration even if we have available > >>>> entries. > >>> > >>> Hi Ilya, > >> > >> Hi, Kevin. > >> > >> Christian and Thiago CC-ed, because, I think, they're faced with > similar issue. > >> > >>> > >>> Thanks for catching this. This intersects with something else I've > seen > >>> wrt retry code and there's a few options... > >>> > >>> 1. Remove retries when nothing sent. For the VM that needs retries > it is a > >>> good thing to have, but Bhanu and I saw in a test with multiple > VM's recently > >>> that if one VM causes a lot of retries there is a large > performance degradation > >>> for the other VM's. So I changed the retry to only occur when at > least one packet > >>> has been sent on the previous call. I put a patch up here. > >>> http://openvswitch.org/pipermail/dev/2016-May/071517.html > >>> > >>> If we keep retries we can either > >>> > >>> 2. Make more robust coordination between > rte_ring_available_entries() and > >>> rte_vhost_enqueue_burst(), as per your patch. > >>> > >>> 3. As you've shown that we can't rely on the > rte_ring_available_entries() to know we > >>> can enqueue, how about just remove it and use > rte_vhost_enqueue_burst() directly > >>> in the retry loop. > >>> > >>> My preference would be for 1. because on balance I'd rather one VM > did not degrade > >>> performance of others, more than I'd like it to have retries. Of > course there could > >>> be some compromise between them as well i.e. reduce amount of > retries, but any retries > >>> could affect performance for another path if they are using the > same core. > >>> > >>> What do you think? > >> > >> I'm worry about scenarios with "pulsing" traffic, i.e. if we have > not very big but > >> enough amount of packets to overload vring in a short time and long > period of silence > >> after that. HW can keep in its RX queues much more packets than can > be pushed to > >> virtio ring. In this scenario, without retrying, most of packets > will be dropped. > >> > >> How about just decreasing of VHOST_ENQ_RETRY_USECS to, may be, 1 > usec with my fix > >> applied of course? Such interval should be enough to handle 20G > traffic with 64B > >> packets by one PMD thread. And, also, this timeout may be applied > to both cases > >> (something sent or not) to decrease cost of retrying. > >> > >> Best regards, Ilya Maximets. > > > > I haven't done any performance testing with many vms, but ... > > > > I think the retry logic was introduced because at the time > NETDEV_MAX_BURST > > was 192 and we felt that batches of 192 packets could easily > overload the > > guest ring. > > > > Now that NETDEV_MAX_BURST is only 32, I agree with Kevin on applying > solution 1. > > Retries can degrade performance for other vms, which IMHO is worse > than > > dropping packets destined for a "slow" receiver vm. > > > > Thanks, > > > > Daniele > > Ok. I agree, that removing of timeout is reasonable. > Another thing: Implementation provided by Kevin allows situation (in > some > corner case) where rte_vhost_enqueue_burst() will be called 32 times > in a row. > How much time will it take? May be we should limit number of retries > or time > spent inside __netdev_dpdk_vhost_send() ? > > What do you think?
Apologies for delay in replying. hmm, that's true. I chatted with Yuanhan and it's not typical but there's no guarantee it can't happen. How about max 8 retries in that case, then we drop? I've put a revised patch up here http://openvswitch.org/pipermail/dev/2016-June/072682.html thanks, Kevin. > > Best regards, Ilya Maximets. > _______________________________________________ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev