On Tue, Nov 26, 2013 at 04:54:22PM -0800, Jarno Rajahalme wrote: > > >> On Nov 18, 2013, at 1:19 PM, Ben Pfaff <b...@nicira.com> wrote: > >> > >> This also restores use, in practice, of the optimized implementation of > >> population count. (As the comment on popcount32() says, this version is > >> 2x faster than __builtin_popcount().) > >> > > I just tested the builtin popcountll with -march=native on i7. It is > about 4x faster than our current version and about 8x faster than the > builtin on a generic build.
-march=native produces nonportable code so we can't use that for generic builds, see the GCC manual: _native_ This selects the CPU to tune for at compilation time by determining the processor type of the compiling machine. Using `-mtune=native' will produce code optimized for the local machine under the constraints of the selected instruction set. Using `-march=native' will enable all instruction subsets supported by the local machine (hence the result might not run on different machines). (It probably uses the POPCNT instruction, did you check?) _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev