I’ve been doing quite a bit of investigation since my last email, in particular adding instrumentation on barrier calls to report open/lowering/closed/raising times, along with calling trees and nesting levels.
As a result I believe I now have a clearer understanding of what’s leading to the packet loss I’m observing when using the API, along with some code changes which in my testing reliably eliminate the 500K packet loss I was previously observing. Would either of you (or anyone else on the list) be able to offer their opinions on my understanding of the causes, along with my proposed solutions? Thanks in advance, Colin. --------- In terms of observed barrier hold times, I’m seeing two main issues related to API calls: * When I issue a long string of async API commands, there is no logic (at least in the version of VPP I’m using) to space out their processing. As a result, if there is a queue of requests, the barrier is opened for just a few us between API calls, before lowering again. This is enough to start one burst of packet processing per worker thread (I can see the barrier lower ends up taking ~100us), but over time not enough to keep up with the input traffic. * Whilst many API calls close the barrier for between a few 10’s of microseconds and a few hundred microseconds, there are a number of calls where this extends from 500us+ into the multiple ms range (which obviously causes the Rx ring buffers to overflow). The particular API calls where I’ve seen this include: ip_neighbor_add_del, gre_add_del_tunnel, create_loopback, sw_interface_set_l2_bridge & sw_interface_add_del_address (thought there may be others which I’m not currently calling). Digging into the call stacks, I can see that in each case there are multiple calls to vlib_node_runtime_update() (I assume one for each node changed), and each of these calls invokes vlib_worker_thread_node_runtime_update() just before returning (I assume to sync the per thread datastructures with the updated graph). The observed execution time for vlib_worker_thread_node_runtime_update() seems to vary with load, config size, etc, but times of between 400us and 800us per call are not atypical in my setup. If there are 5 or 6 invocations of this function per API call, we therefore rapidly get to a situation where the barrier is held for multiple ms. The two workarounds I’ve been using are both changes to vlib/vlib/threads.c : * When closing the barrier in vlib_worker_thread_barrier_sync (but not for recursive invocations), if it hasn’t been open for at least a certain minimum period of time (I’ve been running with 300us), then spin until this minimum is reached, before closing. This ensures that whatever the source of the barrier sync (API, command line, etc), the datapath is always allowed a fair fraction of time to run. (I’ve got in mind various adaptive ways to setting the delay, including a rolling measure of open period over say the last 1ms, and/or Rx ring state, but for initial testing a fixed value seemed easiest.) * From my (potentially superficial) code read, it looks like vlib_worker_thread_node_runtime_update() could be called once to update the workers with multiple node changes (as long as the barrier remains closed between changes), rather than having to be called for each individual change. I have therefore tweaked vlib_worker_thread_node_runtime_update(), so that instead of doing the update to the per thread data structures, by default it simply increments a count and returns. The count is cleared each time the barrier is closed in vlib_worker_thread_barrier_sync() (but not for recursive invocations), and if it is non-zero when vlib_worker_thread_barrier_release() is about to open the barrier, then vlib_worker_thread_barrier_release() is called with a flag which causes it to actually do the updating. This means that the per thread data structures are only updated once per API call, rather than for each individual node change. In my testing this change has reduced the period for which the problem API calls close the barrier, from mutiple ms, to sub-ms (generally under 500us). I have not yet observed any negative consequences (though I fully accept I might well have missed something). Together these two changes eliminate the packet loss I was seeing when using the API under load. Views? (Whilst the API packet loss is currently most important to me, I believe I may have also tracked down the cause of the packet loss when issuing debug commands. I seems as if the debug commands which produce output can block whilst the data is flushed, and if this occurs with the barrier down, then we get similar overflow on the Rx rings. Having said that, because the API problems are more critical, I’ve not yet tried any workarounds.) From: vpp-dev-boun...@lists.fd.io [mailto:vpp-dev-boun...@lists.fd.io] On Behalf Of Colin Tregenza Dancer via vpp-dev Sent: 22 August 2017 15:05 To: Neale Ranns (nranns) <nra...@cisco.com> Cc: vpp-dev@lists.fd.io Subject: Re: [vpp-dev] Packet loss on use of API & cmdline With my current setup (a fairly modest 2Mpps of background traffic each way between a pair of 10G ports on an Intel X520 NIC, with baremetal Ubuntu 16, vpp 17.01 and a couple of cores per NIC), I observed a range of different packet loss scenarios: * 1K-80K packets lost if I issue any of a range of stats/info commands from the telnet command line: “show hard”, “show int”, “show ip arp”, “show ip fib”, “show fib path”. (I haven’t yet tried the same calls via the API, but from code reading would expect similar results.) * Issuing an “ip route add” / “ip route del” pair from the telnet command line, I see 0.5K-30K packets dropped, mainly on the del. * Using the API, if I issue a close sequence of commands to create a new GRE tunnel and setup individual forwarding entries for 64 endpoints at the other end of that tunnel, I see 100K-500K packets dropped. Cheers, Colin. P.S. Have fun on the beach! From: Neale Ranns (nranns) [mailto:nra...@cisco.com] Sent: 22 August 2017 14:35 To: Colin Tregenza Dancer <c...@metaswitch.com<mailto:c...@metaswitch.com>>; Florin Coras <fcoras.li...@gmail.com<mailto:fcoras.li...@gmail.com>> Cc: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> Subject: Re: [vpp-dev] Packet loss on use of API & cmdline Hi Colin, Your comments were not taken as criticism ☺ constructive comments are always greatly appreciated. Apart from the non-MP safe APIs Florin mentioned, and the route add/del cases I covered, the consensus is certainly that packet loss should not occur during a ‘typical’ update and we will do what we can to address it. Could you give us* some specific examples of the operations you do where you see packet loss? Thanks, Neale *I say us not me as I’m about to hit the beach for a couple of weeks.
_______________________________________________ vpp-dev mailing list vpp-dev@lists.fd.io https://lists.fd.io/mailman/listinfo/vpp-dev