Re: [ovs-dev] [PATCH] Simplify kernel sFlow implementation

Ben Pfaff Thu, 11 Aug 2011 11:39:37 -0700

Neil, don't worry!  This really is a step forward and not a step back.
The actions passed to the kernel and back to userspace are not really
as helpful as you think, because they discard a lot of semantic
information.  These actions do say *what* happened but they do not say
*why*.  One off-the-cuff example of the difference is that, if packets
in some stream or streams are being mirrored ("spanned") to a
monitoring port separate from sFlow, then the kernel actions will just
show two outputs and we'll just report to sFlow that there were two,
whereas it might be a better idea to ignore the mirror output and
report the "primary" output port since that will presumably be more
useful for analysis at the sflow collector.


All that we were calculating from the actions before were the egress
ports, VLAN IDs, and egress priority.  That's only 48 bits total.  The
rest we can (and already do) recover from the flow data or the packet,
which will still be available.

When (if?) we do need more than 64 bits, the way forward is more work,
but perfectly doable:

        1. Instead of using the 64-bit data to store the sFlow
           information itself, switch to using it to store a unique
           ID.  (The unique ID could even be kernel-generated.)  Track
           this along with the userspace data associated with each
           flow.

        2. When an sflow message arrives from the kernel, look up the
           userspace data for it based on the unique ID and use it to
           formulate the sflow message.

        3. When a flow changes or gets deleted, keep a copy of the old
           version around in userspace as long as there might still be
           a sflow message coming along for it from the kernel.

Only step #3 is any trouble at all.  We might need to do this kind of
thing anyway, for other reasons, if we want to have perfectly accurate
flow stats in userspace.  That's because, currently, there is a (very
brief) window of time, after the kernel deletes a flow from its flow
table and sends the final statistics to userspace, in which packets
can still arrive and be processed according to the flow's previous set
of actions.  The same construct as #3 above is one possible solution.

On Thu, Aug 11, 2011 at 11:09:36AM -0700, Neil McKee wrote:
> I think the current behavior where the whole actions-list is serialized and 
> passed up with the sample is actually really elegant and future-proof.   New 
> actions are being added all the time,  and having access to a sampled-stream 
> of "exactly-what-is-happening-in-there" is always going to be good to have.  
> Not just for sFlow reporting,  but for any analysis or optimization that you 
> might want to apply.   There is a symmetry to this solution:  what you push 
> down to the kernel is a list of actions,  and what you get back on the 
> sampling feed is a packet and the list of actions that were applied to it.  
> What's not to like about that?
> 
> Furthermore,  whereas your current sFlow implementation may only be reporting 
> the ingress and egress ports, VLAN Ids,  and L2 priorities, plus the number 
> of outbound interfaces (can you get all that into 64-bits?),  that doesn't 
> mean you will never want to report more than that.   Perhaps there is already 
> something else that should be exported with the sFlow feed?   If so,  a new 
> sFlow structure can be designed and it can go out with the sampled header as 
> another annotation.  It's open-ended,  and adding new structures here will 
> never require any further change to the kernel module.
> 
> In particular,  if an open-flow controller is receiving this sFlow feed so it 
> can make network-wide optimization decisions in real time,  then it seems 
> likely that you would want to export details about all the actions.   
> Restricting the output might result in awkward blind-spots, which could make 
> these optimization algorithms exponentially harder to write.   The HPLabs 
> article referenced in this blog post is a good example of the sort of 
> network-wide optimization I am referring to:
> http://blog.sflow.com/2011/05/openflow-and-sflow.html
> 
> And a first draft to define new structures for reporting OpenFlow-related 
> data over sFlow is here:
> http://www.sflow.org/draft-sflow-openflow.txt
> but given the rapid pace of OpenFlow development,  this probably needs to be 
> reworked.
> 
> Neil
> 
> 
> 
> 
> 
> 
> 
> On Aug 9, 2011, at 9:42 PM, Ben Pfaff wrote:
> 
> > On Tue, Aug 09, 2011 at 10:46:38AM -0700, pravin shelar wrote:
> >> This patch is first step toward generic/simple sampling
> >> action which will be completely independent of sflow.
> >> Following patch cleanup sampling upcall by striping it down to
> >> minimum essential data that userspace can not figure out for
> >> itself.
> >> 
> >> Signed-off by: Pravin Shelar <[email protected]>
> > 
> > (That's "-by", not " by".)
> > 
> > I like the general approach here--the sflow support is a big wart--but I
> > think that the userspace interface changes represent a regression.
> > Before, by passing the actions down to userspace, there was always a
> > consistent view of those actions.  With this change (as comments in the
> > code make clear) the actions might have changed or the facet might even
> > have been deleted.
> > 
> > A really good fix would probably require being able to somehow keep
> > old copies of flows that have changed around for a while in userspace,
> > link them to the sflow messages, and free them when they are known not
> > to be needed anymore.  I think that's probably feasible, but in fact I
> > think that we can avoid regression with something simpler.
> > 
> > Suppose we keep the "sample pool" netlink option that this patch
> > removes, and add a way for userspace to set a small amount of user data
> > on a flow, and pass that along as the "user data" netlink option.  64
> > bits of user data should be plenty of space to store what the userspace
> > sflow implementation actually needs (16 bits of final VLAN TCI plus 32
> > bits for sFlow's idea of output port).  Plus, those same 64 bits would
> > be more than enough for the sequence number that the "really good fix"
> > mentioned in the previous paragraph would probably need.
> > 
> > With the current patch, the obvious place to put this user data would be
> > in the flow itself.  Another approach (that you have probably already
> > considered) is to make sFlow sampling a flow action, in which case the
> > user data could be that action's argument.
> > 
> > Thanks,
> > 
> > Ben.
> > _______________________________________________
> > dev mailing list
> > [email protected]
> > http://openvswitch.org/mailman/listinfo/dev
> 
_______________________________________________
dev mailing list
[email protected]
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH] Simplify kernel sFlow implementation

Reply via email to