On Aug 18, 2011, at 6:55 PM, Jesse Gross wrote:
> * Atomic operations are quite slow, which means that enabling sFlow results
> in a major performance hit.
I was alarmed to read this. What is the hit? (I trust your test had the
sampling-probability set so that it only take a handful of samples per second?).
Looking at actions.c: sflow_sample(), is it really just the
"atomic_inc(&p->sflow_pool);" line that does the damage?
What about net_random(). I don't know where to look for the details on this
one. I think Ben said it was about 40 cycles. How does it avoid using a lock
or atomic instruction? Does it maintain separate random-number seeds per
thread or per cpu?
Does the compiler tend to inline the sflow_sample() function?
Should we sprinkle some more "unlikely()" branch-prediction hints?
For another project we've been experimenting with an approach that looks like
this:
if(atomic_decrement(&countdown) == 0) {
<take sample>
for(;;) {
if(atomic_add(&countdown, compute_next_skip()) > 0) break;
drops++;
}
}
Only one thread will see the countdown transition from 1->0 so it's the same as
having a lock. That means you can use whatever random number generator you
want in compute_next_skip(). In the very rare corner case where your next skip
doesn't get "countdown" back above 0 again, then you just register a
dropped-sample and try again. The only step in the critical path is the
atomic_decrement(), but it sounds like we need to rethink this and try to
avoid that atomic_decrement any way we can?
Neil
_______________________________________________
dev mailing list
[email protected]
http://openvswitch.org/mailman/listinfo/dev