I’ve been doing quite a bit of investigation since my last email, in particular 
adding instrumentation on barrier calls to report open/lowering/closed/raising 
times, along with calling trees and nesting levels.

As a result I believe I now have a clearer understanding of what’s leading to 
the packet loss I’m observing when using the API, along with some code changes 
which in my testing reliably eliminate the 500K packet loss I was previously 
observing.

Would either of you (or anyone else on the list) be able to offer their 
opinions on my understanding of the causes, along with my proposed solutions?

Thanks in advance,

Colin.
---------
In terms of observed barrier hold times, I’m seeing two main issues related to 
API calls:


  *   When I issue a long string of async API commands, there is no logic (at 
least in the version of VPP I’m using) to space out their processing.  As a 
result, if there is a queue of requests, the barrier is opened for just a few 
us between API calls, before lowering again.  This is enough to start one burst 
of packet processing per worker thread (I can see the barrier lower ends up 
taking ~100us), but over time not enough to keep up with the input traffic.

  *   Whilst many API calls close the barrier for between a few 10’s of 
microseconds and a few hundred microseconds, there are a number of calls where 
this extends from 500us+ into the multiple ms range (which obviously causes the 
Rx ring buffers to overflow).  The particular API calls where I’ve seen this 
include:  ip_neighbor_add_del, gre_add_del_tunnel, create_loopback, 
sw_interface_set_l2_bridge & sw_interface_add_del_address (thought there may be 
others which I’m not currently calling).

Digging into the call stacks, I can see that in each case there are multiple 
calls to vlib_node_runtime_update()  (I assume one for each node changed), and 
each of these calls invokes vlib_worker_thread_node_runtime_update() just 
before returning (I assume to sync the per thread datastructures with the 
updated graph).  The observed execution time for 
vlib_worker_thread_node_runtime_update() seems to vary with load, config size, 
etc, but times of between 400us and 800us per call are not atypical in my 
setup.  If there are 5 or 6 invocations of this function per API call, we 
therefore rapidly get to a situation where the barrier is held for multiple ms.

The two workarounds I’ve been using are both changes to vlib/vlib/threads.c :


  *   When closing the barrier in vlib_worker_thread_barrier_sync (but not for 
recursive invocations), if it hasn’t been open for at least a certain minimum 
period of time (I’ve been running with 300us), then spin until this minimum is 
reached, before closing.  This ensures that whatever the source of the barrier 
sync (API, command line, etc), the datapath is always allowed a fair fraction 
of time to run. (I’ve got in mind various adaptive ways to setting the delay, 
including a rolling measure of open period over say the last 1ms, and/or Rx 
ring state, but for initial testing a fixed value seemed easiest.)

  *   From my (potentially superficial) code read, it looks like 
vlib_worker_thread_node_runtime_update() could be called once to update the 
workers with multiple node changes (as long as the barrier remains closed 
between changes), rather than having to be called for each individual change.

I have therefore tweaked vlib_worker_thread_node_runtime_update(), so that 
instead of doing the update to the per thread data structures, by default it 
simply increments a count and returns.  The count is cleared each time the 
barrier is closed in vlib_worker_thread_barrier_sync()  (but not for recursive 
invocations), and if it is non-zero when vlib_worker_thread_barrier_release() 
is about to open the barrier, then vlib_worker_thread_barrier_release() is 
called with a flag which causes it to actually do the updating.  This means 
that the per thread data structures are only updated once per API call, rather 
than for each individual node change.

In my testing this change has reduced the period for which the problem API 
calls close the barrier, from mutiple ms, to sub-ms (generally under 500us).  I 
have not yet observed any negative consequences (though I fully accept I might 
well have missed something).

Together these two changes eliminate the packet loss I was seeing when using 
the API under load.

Views?

(Whilst the API packet loss is currently most important to me, I believe I may 
have also tracked down the cause of the packet loss when issuing debug 
commands.  I seems as if the debug commands which produce output can block 
whilst the data is flushed, and if this occurs with the barrier down, then we 
get similar overflow on the Rx rings.  Having said that, because the API 
problems are more critical, I’ve not yet tried any workarounds.)


From: vpp-dev-boun...@lists.fd.io [mailto:vpp-dev-boun...@lists.fd.io] On 
Behalf Of Colin Tregenza Dancer via vpp-dev
Sent: 22 August 2017 15:05
To: Neale Ranns (nranns) <nra...@cisco.com>
Cc: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] Packet loss on use of API & cmdline

With my current setup (a fairly modest 2Mpps of background traffic each way 
between a pair of 10G ports on an Intel X520 NIC, with baremetal Ubuntu 16, vpp 
17.01 and a couple of cores per NIC), I observed a range of different packet 
loss scenarios:


  *   1K-80K packets lost if I issue any of a range of stats/info commands from 
the telnet command line: “show hard”, “show int”, “show ip arp”, “show ip fib”, 
“show fib path”.   (I haven’t yet tried the same calls via the API, but from 
code reading would expect similar results.)
  *   Issuing an “ip route add” / “ip route del” pair from the telnet command 
line, I see 0.5K-30K packets dropped, mainly on the del.
  *   Using the API, if I issue a close sequence of commands to create a new 
GRE tunnel and setup individual forwarding entries for 64 endpoints at the 
other end of that tunnel, I see 100K-500K packets dropped.

Cheers,

Colin.

P.S. Have fun on the beach!


From: Neale Ranns (nranns) [mailto:nra...@cisco.com]
Sent: 22 August 2017 14:35
To: Colin Tregenza Dancer <c...@metaswitch.com<mailto:c...@metaswitch.com>>; 
Florin Coras <fcoras.li...@gmail.com<mailto:fcoras.li...@gmail.com>>
Cc: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>
Subject: Re: [vpp-dev] Packet loss on use of API & cmdline


Hi Colin,

Your comments were not taken as criticism ☺ constructive comments are always 
greatly appreciated.

Apart from the non-MP safe APIs Florin mentioned, and the route add/del cases I 
covered, the consensus is certainly that packet loss should not occur during a 
‘typical’ update and we will do what we can to address it.
Could you give us* some specific examples of the operations you do where you 
see packet loss?

Thanks,
Neale

*I say us not me as I’m about to hit the beach for a couple of weeks.
_______________________________________________
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev

Reply via email to