On Aug 18, 2011, at 6:55 PM, Jesse Gross wrote: > * Atomic operations are quite slow, which means that enabling sFlow results > in a major performance hit.
I was alarmed to read this. What is the hit? (I trust your test had the sampling-probability set so that it only take a handful of samples per second?). Looking at actions.c: sflow_sample(), is it really just the "atomic_inc(&p->sflow_pool);" line that does the damage? What about net_random(). I don't know where to look for the details on this one. I think Ben said it was about 40 cycles. How does it avoid using a lock or atomic instruction? Does it maintain separate random-number seeds per thread or per cpu? Does the compiler tend to inline the sflow_sample() function? Should we sprinkle some more "unlikely()" branch-prediction hints? For another project we've been experimenting with an approach that looks like this: if(atomic_decrement(&countdown) == 0) { <take sample> for(;;) { if(atomic_add(&countdown, compute_next_skip()) > 0) break; drops++; } } Only one thread will see the countdown transition from 1->0 so it's the same as having a lock. That means you can use whatever random number generator you want in compute_next_skip(). In the very rare corner case where your next skip doesn't get "countdown" back above 0 again, then you just register a dropped-sample and try again. The only step in the critical path is the atomic_decrement(), but it sounds like we need to rethink this and try to avoid that atomic_decrement any way we can? Neil
_______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev