Re: [ovs-dev] [PATCH v2] Simplify kernel sFlow implementation

Neil McKee Fri, 19 Aug 2011 11:36:10 -0700

On Aug 18, 2011, at 6:55 PM, Jesse Gross wrote:

> * Atomic operations are quite slow, which means that enabling sFlow results 
> in a major performance hit.


I was alarmed to read this.  What is the hit?  (I trust your test had the 
sampling-probability set so that it only take a handful of samples per second?).

Looking at actions.c: sflow_sample(),  is it really just the 
"atomic_inc(&p->sflow_pool);" line that does the damage?  

What about net_random().  I don't know where to look for the details on this 
one.  I think Ben said it was about 40 cycles.   How does it avoid using a lock 
or atomic instruction?  Does it maintain separate random-number seeds per 
thread or per cpu?

Does the compiler tend to inline the sflow_sample() function?

Should we sprinkle some more "unlikely()" branch-prediction hints?

For another project we've been experimenting with an approach that looks like 
this:

if(atomic_decrement(&countdown) == 0) {
   <take sample>
   for(;;) {
     if(atomic_add(&countdown, compute_next_skip()) > 0) break;
     drops++;
   }
}

Only one thread will see the countdown transition from 1->0 so it's the same as 
having a lock.  That means you can use whatever random number generator you 
want in compute_next_skip().  In the very rare corner case where your next skip 
doesn't get "countdown" back above 0 again,  then you just register a 
dropped-sample and try again.    The only step in the critical path is the 
atomic_decrement(),  but it sounds like we need to rethink this and try to 
avoid that atomic_decrement any way we can?

Neil

_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH v2] Simplify kernel sFlow implementation

Reply via email to