Re: [vpp-dev] worker barrier state

Neale Ranns via lists.fd.io Wed, 15 Apr 2020 02:38:27 -0700

Hi Chris,

Firstly, apologies for the lengthy delay.


When I say 'state' in the following I'm referring to some object[s] that are 
used to forward packets. 

I'd classify the possible solution space as:
1) maintain per-packet counters for the state to indicate how many packets 
currently refer to that state.
     Pros; we know exactly when the state is no longer required and can be 
safely removed.
     Cons; significant per-packet cost, similar to maintaining counters. For 
reference, on my [aging] system enabling adjacency counters takes ip4-rewrite 
from 2.52e1 to 3.49e1 clocks. The wait times could be large (equivalent to 
flushing queues). 
2) flush queues; ensure that there are no packets in flight, anywhere, when the 
workers stop at the barrier.
    Pros; It's certainly safe to delete state under these conditions.
    Cons; for handoff this could be known, though the wait time would be long. 
For async crypto HW this may not be knowable and if it is the wait times would 
be large. Either way we may end up waiting for a worst-case scenario, which is 
way longer that actually needed.
3) epochs; maintain a global epoch; each time an API is called, the epoch is 
bumped. Packets entering the system get stamped with the current epoch. If a 
node sees a packet whose epoch does not match the global one, it is dropped.
    Pros: simple scheme, low/negligible DP cost.
    Cons: all inflight packets would be dropped on all API calls, not just the 
packets that would use the state that is being deleted.
4) MP safe: remove the state with the workers unblocked. This is a multi-stage 
process. Firstly, unlink the state from the lookup data-structures so no more 
packets can find it. Secondly, 'atomically' update the state so that packets 
using it still perform a consistent action (probably drop). Thirdly, don't 
reuse that state (i.e. recycle its pool index) until all the inflight packets 
pass through the system (mis-forwarding must be avoided). Make-before-break, if 
that term means anything to you __
    Pros; MP safe is always good, since there's less packet drops. Zero 
per-packet DP cost. 
    Cons; it's not easy to get right nor test.

IMHO the drawbacks of options 1, 2 & 3 rule them out, which leaves us only 4.

For option 4, the first and second steps are very much dependent on the type of 
state we're talking about. For SAs for example, unlinking the SA from the 
lookup data-structure is accomplished using a separate API from the SA delete*. 
The final step we can easily accomplish with a new version of the pool 
allocator whose free-list prevents reuse for say 5 seconds (an age in DP terms).

Thoughts?

/neale

* I note that a SA delete is already (optimistically) marked MP safe, which 
assumes the system flushes inbetween these API calls.




On 26/03/2020 16:09, "Christian Hopps" <cho...@chopps.org> wrote:

    
    
    > On Mar 25, 2020, at 1:39 PM, Dave Barach via Lists.Fd.Io 
<dbarach=cisco....@lists.fd.io> wrote:
    > 
    > Vlib_main_t *vm->main_loop_count.
    > 
    > One trip around the main loop accounts for all per-worker local graph 
edges / acyclic graph behaviors. 
    > 
    > As to the magic number E (not to be confused with e): repeatedly handing 
off packets from thread to thread seems like a bad implementation strategy. The 
packet tracer will tell you how many handoffs are involved in a certain path, 
as will a bit of code inspection.
    
    No, it would not be a good implementation strategy. :)
    
    However, I was looking at trying to code this in an upstreamable way, and I 
didn't think I got to make assumptions about how others might wire things 
together. I suppose we could just define a maximum number of handoffs and then 
if users violated that number they would need to increase it?
    
    > Neale has some experience with this scenario, maybe he can share some 
thoughts...
    
    Hoping so. :)
    
    I noticed that crypto engine handoffs were added to the non-dpdk ipsec 
encrypt/decrypt in master, which seems somewhat relevant.
    
    Thanks,
    Chris.

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16081): https://lists.fd.io/g/vpp-dev/message/16081
Mute This Topic: https://lists.fd.io/mt/72542383/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] worker barrier state

Reply via email to