On 23.05.2016 17:55, Traynor, Kevin wrote:
>> -----Original Message-----
>> From: Ilya Maximets [mailto:i.maxim...@samsung.com]
>> Sent: Tuesday, May 17, 2016 4:09 PM
>> To: dev@openvswitch.org; Daniele Di Proietto <diproiet...@vmware.com>
>> Cc: Dyasly Sergey <s.dya...@samsung.com>; Heetae Ahn
>> <heetae82....@samsung.com>; Flavio Leitner <f...@sysclose.org>;
>> Traynor, Kevin <kevin.tray...@intel.com>; Pravin B Shelar
>> <pshe...@ovn.org>; Ilya Maximets <i.maxim...@samsung.com>
>> Subject: [PATCH] netdev-dpdk: Fix PMD threads hang in
>> __netdev_dpdk_vhost_send().
>>
>> There are situations when PMD thread can hang forever inside
>> __netdev_dpdk_vhost_send() because of broken virtqueue ring.
>>
>> This happens if rte_vring_available_entries() always positive and
>> rte_vhost_enqueue_burst() can't send anything (possible with broken
>> ring).
>>
>> In this case time expiration will be never checked and 'do {} while
>> (cnt)'
>> loop will be infinite.
>>
>> This scenario sometimes reproducible with dpdk-16.04-rc2 inside guest
>> VM.
>> Also it may be reproduced by manual braking of ring structure inside
>> the guest VM.
>>
>> Fix that by checking time expiration even if we have available
>> entries.
> 
> Hi Ilya,

Hi, Kevin.

Christian and Thiago CC-ed, because, I think, they're faced with similar issue.

> 
> Thanks for catching this. This intersects with something else I've seen
> wrt retry code and there's a few options...
> 
> 1. Remove retries when nothing sent. For the VM that needs retries it is a
> good thing to have, but Bhanu and I saw in a test with multiple VM's recently
> that if one VM causes a lot of retries there is a large performance 
> degradation
> for the other VM's. So I changed the retry to only occur when at least one 
> packet
> has been sent on the previous call. I put a patch up here.
> http://openvswitch.org/pipermail/dev/2016-May/071517.html
> 
> If we keep retries we can either
> 
> 2. Make more robust coordination between rte_ring_available_entries() and
> rte_vhost_enqueue_burst(), as per your patch.
> 
> 3. As you've shown that we can't rely on the rte_ring_available_entries() to 
> know we
> can enqueue, how about just remove it and use rte_vhost_enqueue_burst() 
> directly
> in the retry loop.
> 
> My preference would be for 1. because on balance I'd rather one VM did not 
> degrade
> performance of others, more than I'd like it to have retries. Of course there 
> could
> be some compromise between them as well i.e. reduce amount of retries, but 
> any retries
> could affect performance for another path if they are using the same core.
> 
> What do you think?

I'm worry about scenarios with "pulsing" traffic, i.e. if we have not very big 
but
enough amount of packets to overload vring in a short time and long period of 
silence
after that. HW can keep in its RX queues much more packets than can be pushed to
virtio ring. In this scenario, without retrying, most of packets will be 
dropped.

How about just decreasing of VHOST_ENQ_RETRY_USECS to, may be, 1 usec with my 
fix
applied of course? Such interval should be enough to handle 20G traffic with 64B
packets by one PMD thread. And, also, this timeout may be applied to both cases
(something sent or not) to decrease cost of retrying.

Best regards, Ilya Maximets.

> 
> Kevin. 
> 
>>
>> Signed-off-by: Ilya Maximets <i.maxim...@samsung.com>
>> ---
>>
>> How to reproduce manually:
>>
>> * Start packet flow.
>>
>> * Start testpmd inside guest VM under gdb:
>>      # gdb --args ./testpmd -c 0x1f -n 2 --socket-mem=2048 -w
>> 0000:00:0a.0 \\
>>        -- --burst=64 --txd=512 --rxd=512
>>
>> * Break virtqueue ring somehow:
>>      ^C
>>      # (gdb) break virtqueue_enqueue_xmit
>>      # (gdb) p txvq->vq_ring.avail->idx
>>      $1 = 30784
>>      # (gdb) p &txvq->vq_ring.avail->idx
>>      $2 = (uint16_t *) 0x7fff7ea9b002
>>      # (gdb) set {uint16_t}0x7fff7ea9b002 = 0
>>      # (gdb) disable 1
>>      # (gdb) c
>>      Continuing.
>>
>> * Hardly stop testpmd:
>>      ^C
>>      # (gdb) quit
>>      A debugging session is active.
>>      Quit anyway? (y or n) y
>>
>> * Start testpmd inside VM again and see results:
>>      # gdb --args ./testpmd -c 0x1f -n 2 --socket-mem=2048 -w
>> 0000:00:0a.0 \\
>>        -- --burst=64 --txd=512 --rxd=512
>>      EAL: Detected 5 lcore(s)
>>      EAL: Probing VFIO support...
>>      EAL: PCI device 0000:00:0a.0 on NUMA socket -1
>>      EAL:   probe driver: 1af4:1000 rte_virtio_pmd
>>      *******SSH hangs and QEMU crashes here*******
>>
>> * On the OVS side we can see hang of PMD thread:
>>      |00003|ovs_rcu(urcu2)|WARN|blocked 4005 ms waiting for pmd54 to
>> quiesce
>>      |00004|ovs_rcu(urcu2)|WARN|blocked 8005 ms waiting for pmd54 to
>> quiesce
>>      |00005|ovs_rcu(urcu2)|WARN|blocked 16004 ms waiting for pmd54 to
>> quiesce
>>      ....
>>
>>
>> lib/netdev-dpdk.c | 18 +++++++++---------
>>  1 file changed, 9 insertions(+), 9 deletions(-)
>>
>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>> index 87879d5..aef6ea4 100644
>> --- a/lib/netdev-dpdk.c
>> +++ b/lib/netdev-dpdk.c
>> @@ -1367,24 +1367,24 @@ __netdev_dpdk_vhost_send(struct netdev
>> *netdev, int qid,
>>              cur_pkts = &cur_pkts[tx_pkts];
>>          } else {
>>              uint64_t timeout = VHOST_ENQ_RETRY_USECS *
>> rte_get_timer_hz() / 1E6;
>> -            unsigned int expired = 0;
>> +            bool expired = false;
>>
>> -            if (!start) {
>> +            if (start) {
>> +                expired = (rte_get_timer_cycles() - start) > timeout;
>> +            } else {
>>                  start = rte_get_timer_cycles();
>>              }
>> -
>>              /*
>>               * Unable to enqueue packets to vhost interface.
>>               * Check available entries before retrying.
>>               */
>> -            while (!rte_vring_available_entries(virtio_dev,
>> vhost_qid)) {
>> -                if (OVS_UNLIKELY((rte_get_timer_cycles() - start) >
>> timeout)) {
>> -                    expired = 1;
>> -                    break;
>> -                }
>> +            while (!rte_vring_available_entries(virtio_dev,
>> vhost_qid)
>> +                   && !expired) {
>> +                expired = (rte_get_timer_cycles() - start) > timeout;
>>              }
>> +
>>              if (expired) {
>> -                /* break out of main loop. */
>> +                /* Retrying took too much time. Break out of main
>> loop. */
>>                  break;
>>              }
>>          }
>> --
>> 2.5.0
> 
> 
> 
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to