On Thu, Dec 05, 2013 at 04:36:26PM -0800, Jarno Rajahalme wrote: > Inline, use another well-known algorithm for 64-bit builds, and use > builtins when they are known to be fast at compile time. A 32-bit > version of the alternate algorithm is slower than the existing > implementation, so the old one is used for 32-bit builds. Inline > assembler would be a bit faster on 32-bit i7 build, but we use the GCC > builtin for portability. > > It should be stressed builds for specific CPUs do not work on others > CPUs, and that OVS build system or runtime does not currently support > CPU detection. > > Speed improvement v.s. existing implementation / GCC 4.7 > __builtin_popcountll(): > > i386: 64% (inlining) / 380% > i386 on i7: 240% (inlining + builtin) / 820% > x86_64: 59% (inlining + different algorithm) / 190% > x86_64 on i7: 370% (inlining + builtin) / 0% > > Signed-off-by: Jarno Rajahalme <jrajaha...@nicira.com>
Wow. How did you measure the benefit of inlining? _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev