Pushed as-is, leaving Ben’s preference as an opportunity for future improvement.
Jarno
On Dec 6, 2013, at 9:24 AM, Ben Pfaff <b...@nicira.com> wrote:
> On Thu, Dec 05, 2013 at 04:36:26PM -0800, Jarno Rajahalme wrote:
>> Inline, use another well-known algorithm for 64-bit builds, and use
>> builtins when they are known to be fast at compile time. A 32-bit
>> version of the alternate algorithm is slower than the existing
>> implementation, so the old one is used for 32-bit builds. Inline
>> assembler would be a bit faster on 32-bit i7 build, but we use the GCC
>> builtin for portability.
>>
>> It should be stressed builds for specific CPUs do not work on others
>> CPUs, and that OVS build system or runtime does not currently support
>> CPU detection.
>>
>> Speed improvement v.s. existing implementation / GCC 4.7
>> __builtin_popcountll():
>>
>> i386: 64% (inlining) / 380%
>> i386 on i7: 240% (inlining + builtin) / 820%
>> x86_64: 59% (inlining + different algorithm) / 190%
>> x86_64 on i7: 370% (inlining + builtin) / 0%
>>
>> Signed-off-by: Jarno Rajahalme <jrajaha...@nicira.com>
>
> Instead of defined(__corei7), I would write __POPCNT__, a GCC macro
> specific to popcnt instruction support. I don't think that __corei7 is
> a good test because it is too specific: successors to Core i7 will
> almost certainly also have POPCNT.
>
You are absolutely right about this. However, I’m running out of time for today
and decided to push this now rather than wait.
> Acked-by: Ben Pfaff <b...@nicira.com>
Thanks!
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev