On Mon, Jul 16, 2012 at 09:43:01PM +0400, Alexander V. Chernikov wrote: > On 06.07.2012 10:11, Luigi Rizzo wrote: > >On Thu, Jul 05, 2012 at 05:40:37PM +0400, Alexander V. Chernikov wrote: > >>On 04.07.2012 19:48, Luigi Rizzo wrote: > >the thing discussed a few years ago (at least the one i took out of the > >discussion) was that the counter fields in rules should hold the > >index of a per-cpu counter associated to the rule. So CTR_INC(rule->ctr) > >becomes something like pcpu->ipfw_ctrs[rule->ctr]++ > >Once you create a new rule you also grab one free index from ipfw_ctrs[], > >and the same should go for dummynet counters. > > Old kernel from previous letters, same setup: > > net.inet.ip.fw.enable=0 > 2.3 MPPS > net.inet.ip.fw.update_counters=0 > net.inet.ip.fw.enable=1 > 1.93MPPS > net.inet.ip.fw.update_counters=1 > 1.74MPPS > > Kernel with ipfw pcpu counters: > > net.inet.ip.fw.enable=0 > 2.3 MPPS > net.inet.ip.fw.update_counters=0 > net.inet.ip.fw.enable=1 > 1.93MPPS > net.inet.ip.fw.update_counters=1 > 1.93MPPS > > Counters seems to be working without any (significant) overhead. > (Maybe I'm wrong somewhere?) > > Additionally, I've got (from my previous pcpu attempt) a small patch > permitting ipfw to re-use rule map allocation instead of reallocating > on every rule. This saves a bit of system time: > > loading 20k rules with ipfw binary gives us: > 5.1s system time before and 4.1s system time after. >
I do not think that your 'per-cpu' counter are correct. The thread migration or rescheduling causes the fetch or update of the wrong per-cpu structure. This allows parallel updates with undefined consequences. As a lowest thing to do, you need to disable preeemption around counter structure dereference and increment.
pgpC4zejykhxe.pgp
Description: PGP signature