On Tue, Nov 26, 2013 at 04:54:22PM -0800, Jarno Rajahalme wrote:
> 
> >> On Nov 18, 2013, at 1:19 PM, Ben Pfaff <b...@nicira.com> wrote:
> >> 
> >> This also restores use, in practice, of the optimized implementation of
> >> population count.  (As the comment on popcount32() says, this version is
> >> 2x faster than __builtin_popcount().)
> >> 
> 
> I just tested the builtin popcountll with -march=native on i7. It is
> about 4x faster than our current version and about 8x faster than the
> builtin on a generic build.

-march=native produces nonportable code so we can't use that for generic
builds, see the GCC manual:

    _native_
          This selects the CPU to tune for at compilation time by
          determining the processor type of the compiling machine.
          Using `-mtune=native' will produce code optimized for the
          local machine under the constraints of the selected
          instruction set.  Using `-march=native' will enable all
          instruction subsets supported by the local machine (hence the
          result might not run on different machines).

(It probably uses the POPCNT instruction, did you check?)
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to