Colin,

Good investigation!

A good first step would be to make all APIs and CLIs thread safe.
When an API/CLI is thread safe, that must be flagged through the is_mp_safe 
flag.
It is quite likely that many already are, but haven't been flagged as such.

Best regards,
Ole


> On 31 Aug 2017, at 19:07, Colin Tregenza Dancer via vpp-dev 
> <vpp-dev@lists.fd.io> wrote:
> 
> I’ve been doing quite a bit of investigation since my last email, in 
> particular adding instrumentation on barrier calls to report 
> open/lowering/closed/raising times, along with calling trees and nesting 
> levels.
> 
> As a result I believe I now have a clearer understanding of what’s leading to 
> the packet loss I’m observing when using the API, along with some code 
> changes which in my testing reliably eliminate the 500K packet loss I was 
> previously observing.
> 
> Would either of you (or anyone else on the list) be able to offer their 
> opinions on my understanding of the causes, along with my proposed solutions?
> 
> Thanks in advance,
> 
> Colin.
> ---------
> In terms of observed barrier hold times, I’m seeing two main issues related 
> to API calls:
> 
>       • When I issue a long string of async API commands, there is no logic 
> (at least in the version of VPP I’m using) to space out their processing.  As 
> a result, if there is a queue of requests, the barrier is opened for just a 
> few us between API calls, before lowering again.  This is enough to start one 
> burst of packet processing per worker thread (I can see the barrier lower 
> ends up taking ~100us), but over time not enough to keep up with the input 
> traffic.
> 
>       • Whilst many API calls close the barrier for between a few 10’s of 
> microseconds and a few hundred microseconds, there are a number of calls 
> where this extends from 500us+ into the multiple ms range (which obviously 
> causes the Rx ring buffers to overflow).  The particular API calls where I’ve 
> seen this include:  ip_neighbor_add_del, gre_add_del_tunnel, create_loopback, 
> sw_interface_set_l2_bridge & sw_interface_add_del_address (thought there may 
> be others which I’m not currently calling).
> 
> Digging into the call stacks, I can see that in each case there are multiple 
> calls to vlib_node_runtime_update()  (I assume one for each node changed), 
> and each of these calls invokes vlib_worker_thread_node_runtime_update() just 
> before returning (I assume to sync the per thread datastructures with the 
> updated graph).  The observed execution time for 
> vlib_worker_thread_node_runtime_update() seems to vary with load, config 
> size, etc, but times of between 400us and 800us per call are not atypical in 
> my setup.  If there are 5 or 6 invocations of this function per API call, we 
> therefore rapidly get to a situation where the barrier is held for multiple 
> ms.
> 
> The two workarounds I’ve been using are both changes to vlib/vlib/threads.c :
> 
>       • When closing the barrier in vlib_worker_thread_barrier_sync (but not 
> for recursive invocations), if it hasn’t been open for at least a certain 
> minimum period of time (I’ve been running with 300us), then spin until this 
> minimum is reached, before closing.  This ensures that whatever the source of 
> the barrier sync (API, command line, etc), the datapath is always allowed a 
> fair fraction of time to run. (I’ve got in mind various adaptive ways to 
> setting the delay, including a rolling measure of open period over say the 
> last 1ms, and/or Rx ring state, but for initial testing a fixed value seemed 
> easiest.)
> 
>       • From my (potentially superficial) code read, it looks like 
> vlib_worker_thread_node_runtime_update() could be called once to update the 
> workers with multiple node changes (as long as the barrier remains closed 
> between changes), rather than having to be called for each individual change.
> 
> I have therefore tweaked vlib_worker_thread_node_runtime_update(), so that 
> instead of doing the update to the per thread data structures, by default it 
> simply increments a count and returns.  The count is cleared each time the 
> barrier is closed in vlib_worker_thread_barrier_sync()  (but not for 
> recursive invocations), and if it is non-zero when 
> vlib_worker_thread_barrier_release() is about to open the barrier, then 
> vlib_worker_thread_barrier_release() is called with a flag which causes it to 
> actually do the updating.  This means that the per thread data structures are 
> only updated once per API call, rather than for each individual node change.
> 
> In my testing this change has reduced the period for which the problem API 
> calls close the barrier, from mutiple ms, to sub-ms (generally under 500us).  
> I have not yet observed any negative consequences (though I fully accept I 
> might well have missed something).
> 
> Together these two changes eliminate the packet loss I was seeing when using 
> the API under load.
> 
> Views?
> 
> (Whilst the API packet loss is currently most important to me, I believe I 
> may have also tracked down the cause of the packet loss when issuing debug 
> commands.  I seems as if the debug commands which produce output can block 
> whilst the data is flushed, and if this occurs with the barrier down, then we 
> get similar overflow on the Rx rings.  Having said that, because the API 
> problems are more critical, I’ve not yet tried any workarounds.)
> 
> From: vpp-dev-boun...@lists.fd.io [mailto:vpp-dev-boun...@lists.fd.io] On 
> Behalf Of Colin Tregenza Dancer via vpp-dev
> Sent: 22 August 2017 15:05
> To: Neale Ranns (nranns) <nra...@cisco.com>
> Cc: vpp-dev@lists.fd.io
> Subject: Re: [vpp-dev] Packet loss on use of API & cmdline
> 
> With my current setup (a fairly modest 2Mpps of background traffic each way 
> between a pair of 10G ports on an Intel X520 NIC, with baremetal Ubuntu 16, 
> vpp 17.01 and a couple of cores per NIC), I observed a range of different 
> packet loss scenarios:
> 
>       • 1K-80K packets lost if I issue any of a range of stats/info commands 
> from the telnet command line: “show hard”, “show int”, “show ip arp”, “show 
> ip fib”, “show fib path”.   (I haven’t yet tried the same calls via the API, 
> but from code reading would expect similar results.)
>       • Issuing an “ip route add” / “ip route del” pair from the telnet 
> command line, I see 0.5K-30K packets dropped, mainly on the del.
>       • Using the API, if I issue a close sequence of commands to create a 
> new GRE tunnel and setup individual forwarding entries for 64 endpoints at 
> the other end of that tunnel, I see 100K-500K packets dropped.
> 
> Cheers,
> 
> Colin.
> 
> P.S. Have fun on the beach!
> 
> 
> From: Neale Ranns (nranns) [mailto:nra...@cisco.com]
> Sent: 22 August 2017 14:35
> To: Colin Tregenza Dancer <c...@metaswitch.com>; Florin Coras 
> <fcoras.li...@gmail.com>
> Cc: vpp-dev@lists.fd.io
> Subject: Re: [vpp-dev] Packet loss on use of API & cmdline
> 
> 
> Hi Colin,
> 
> Your comments were not taken as criticism J constructive comments are always 
> greatly appreciated.
> 
> Apart from the non-MP safe APIs Florin mentioned, and the route add/del cases 
> I covered, the consensus is certainly that packet loss should not occur 
> during a ‘typical’ update and we will do what we can to address it.
> Could you give us* some specific examples of the operations you do where you 
> see packet loss?
> 
> Thanks,
> Neale
> 
> *I say us not me as I’m about to hit the beach for a couple of weeks.
> _______________________________________________
> vpp-dev mailing list
> vpp-dev@lists.fd.io
> https://lists.fd.io/mailman/listinfo/vpp-dev

Attachment: signature.asc
Description: Message signed with OpenPGP

_______________________________________________
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev

Reply via email to