Colin, Good investigation!
A good first step would be to make all APIs and CLIs thread safe. When an API/CLI is thread safe, that must be flagged through the is_mp_safe flag. It is quite likely that many already are, but haven't been flagged as such. Best regards, Ole > On 31 Aug 2017, at 19:07, Colin Tregenza Dancer via vpp-dev > <vpp-dev@lists.fd.io> wrote: > > I’ve been doing quite a bit of investigation since my last email, in > particular adding instrumentation on barrier calls to report > open/lowering/closed/raising times, along with calling trees and nesting > levels. > > As a result I believe I now have a clearer understanding of what’s leading to > the packet loss I’m observing when using the API, along with some code > changes which in my testing reliably eliminate the 500K packet loss I was > previously observing. > > Would either of you (or anyone else on the list) be able to offer their > opinions on my understanding of the causes, along with my proposed solutions? > > Thanks in advance, > > Colin. > --------- > In terms of observed barrier hold times, I’m seeing two main issues related > to API calls: > > • When I issue a long string of async API commands, there is no logic > (at least in the version of VPP I’m using) to space out their processing. As > a result, if there is a queue of requests, the barrier is opened for just a > few us between API calls, before lowering again. This is enough to start one > burst of packet processing per worker thread (I can see the barrier lower > ends up taking ~100us), but over time not enough to keep up with the input > traffic. > > • Whilst many API calls close the barrier for between a few 10’s of > microseconds and a few hundred microseconds, there are a number of calls > where this extends from 500us+ into the multiple ms range (which obviously > causes the Rx ring buffers to overflow). The particular API calls where I’ve > seen this include: ip_neighbor_add_del, gre_add_del_tunnel, create_loopback, > sw_interface_set_l2_bridge & sw_interface_add_del_address (thought there may > be others which I’m not currently calling). > > Digging into the call stacks, I can see that in each case there are multiple > calls to vlib_node_runtime_update() (I assume one for each node changed), > and each of these calls invokes vlib_worker_thread_node_runtime_update() just > before returning (I assume to sync the per thread datastructures with the > updated graph). The observed execution time for > vlib_worker_thread_node_runtime_update() seems to vary with load, config > size, etc, but times of between 400us and 800us per call are not atypical in > my setup. If there are 5 or 6 invocations of this function per API call, we > therefore rapidly get to a situation where the barrier is held for multiple > ms. > > The two workarounds I’ve been using are both changes to vlib/vlib/threads.c : > > • When closing the barrier in vlib_worker_thread_barrier_sync (but not > for recursive invocations), if it hasn’t been open for at least a certain > minimum period of time (I’ve been running with 300us), then spin until this > minimum is reached, before closing. This ensures that whatever the source of > the barrier sync (API, command line, etc), the datapath is always allowed a > fair fraction of time to run. (I’ve got in mind various adaptive ways to > setting the delay, including a rolling measure of open period over say the > last 1ms, and/or Rx ring state, but for initial testing a fixed value seemed > easiest.) > > • From my (potentially superficial) code read, it looks like > vlib_worker_thread_node_runtime_update() could be called once to update the > workers with multiple node changes (as long as the barrier remains closed > between changes), rather than having to be called for each individual change. > > I have therefore tweaked vlib_worker_thread_node_runtime_update(), so that > instead of doing the update to the per thread data structures, by default it > simply increments a count and returns. The count is cleared each time the > barrier is closed in vlib_worker_thread_barrier_sync() (but not for > recursive invocations), and if it is non-zero when > vlib_worker_thread_barrier_release() is about to open the barrier, then > vlib_worker_thread_barrier_release() is called with a flag which causes it to > actually do the updating. This means that the per thread data structures are > only updated once per API call, rather than for each individual node change. > > In my testing this change has reduced the period for which the problem API > calls close the barrier, from mutiple ms, to sub-ms (generally under 500us). > I have not yet observed any negative consequences (though I fully accept I > might well have missed something). > > Together these two changes eliminate the packet loss I was seeing when using > the API under load. > > Views? > > (Whilst the API packet loss is currently most important to me, I believe I > may have also tracked down the cause of the packet loss when issuing debug > commands. I seems as if the debug commands which produce output can block > whilst the data is flushed, and if this occurs with the barrier down, then we > get similar overflow on the Rx rings. Having said that, because the API > problems are more critical, I’ve not yet tried any workarounds.) > > From: vpp-dev-boun...@lists.fd.io [mailto:vpp-dev-boun...@lists.fd.io] On > Behalf Of Colin Tregenza Dancer via vpp-dev > Sent: 22 August 2017 15:05 > To: Neale Ranns (nranns) <nra...@cisco.com> > Cc: vpp-dev@lists.fd.io > Subject: Re: [vpp-dev] Packet loss on use of API & cmdline > > With my current setup (a fairly modest 2Mpps of background traffic each way > between a pair of 10G ports on an Intel X520 NIC, with baremetal Ubuntu 16, > vpp 17.01 and a couple of cores per NIC), I observed a range of different > packet loss scenarios: > > • 1K-80K packets lost if I issue any of a range of stats/info commands > from the telnet command line: “show hard”, “show int”, “show ip arp”, “show > ip fib”, “show fib path”. (I haven’t yet tried the same calls via the API, > but from code reading would expect similar results.) > • Issuing an “ip route add” / “ip route del” pair from the telnet > command line, I see 0.5K-30K packets dropped, mainly on the del. > • Using the API, if I issue a close sequence of commands to create a > new GRE tunnel and setup individual forwarding entries for 64 endpoints at > the other end of that tunnel, I see 100K-500K packets dropped. > > Cheers, > > Colin. > > P.S. Have fun on the beach! > > > From: Neale Ranns (nranns) [mailto:nra...@cisco.com] > Sent: 22 August 2017 14:35 > To: Colin Tregenza Dancer <c...@metaswitch.com>; Florin Coras > <fcoras.li...@gmail.com> > Cc: vpp-dev@lists.fd.io > Subject: Re: [vpp-dev] Packet loss on use of API & cmdline > > > Hi Colin, > > Your comments were not taken as criticism J constructive comments are always > greatly appreciated. > > Apart from the non-MP safe APIs Florin mentioned, and the route add/del cases > I covered, the consensus is certainly that packet loss should not occur > during a ‘typical’ update and we will do what we can to address it. > Could you give us* some specific examples of the operations you do where you > see packet loss? > > Thanks, > Neale > > *I say us not me as I’m about to hit the beach for a couple of weeks. > _______________________________________________ > vpp-dev mailing list > vpp-dev@lists.fd.io > https://lists.fd.io/mailman/listinfo/vpp-dev
signature.asc
Description: Message signed with OpenPGP
_______________________________________________ vpp-dev mailing list vpp-dev@lists.fd.io https://lists.fd.io/mailman/listinfo/vpp-dev