Hi! > When some objects are allocated by one CPU but freed by another CPU we can > consume lot of cycles doing divides in obj_to_index(). > > (Typical load on a dual processor machine where network interrupts are > handled > by one particular CPU (allocating skbufs), and the other CPU is running the > application (consuming and freeing skbufs)) > > Here on one production server (dual-core AMD Opteron 285), I noticed this > divide took 1.20 % of CPU_CLK_UNHALTED events in kernel. But Opteron are > quite modern cpus and the divide is much more expensive on oldest > architectures : > > On a 200 MHz sparcv9 machine, the division takes 64 cycles instead of 1 cycle > for a multiply. > > Doing some math, we can use a reciprocal multiplication instead of a divide. > > If we want to compute V = (A / B) (A and B being u32 quantities) > we can instead use : > > V = ((u64)A * RECIPROCAL(B)) >> 32 ; > > where RECIPROCAL(B) is precalculated to ((1LL << 32) + (B - 1)) / B
Well, I guess it should be gcc doing this optimalization, not we by hand. And I believe gcc *is* smart enough to do it in some cases... pavel -- Thanks for all the (sleeping) penguins. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/