On Dec 6, 2013, at 1:18 AM, Alexander Wu <alexander...@huawei.com> wrote:

> Hi Jarno,
> 
> I've read your patch "better count1_bits", and I test the gcc
> builtins separately.
> 
> Call __builtin_popcount|__builtin_popcountl|__builtin_popcountll 10 million 
> times
> --------------------------------------
>    suse-kvm-of13:/test # time ./bit4
> 
>    real    0m0.034s
>    user    0m0.032s
>    sys     0m0.000s
> 
> Call count1_bits 10 million times
> --------------------------------------
>    suse-kvm-of13:/test # time ./bit1
> 
>    real    0m0.080s
>    user    0m0.076s
>    sys     0m0.000s
> 
> Looks good, but I've a problem below.
> My cpuinfo: 16U * Intel(R) Xeon(R) CPU E5620 @ 2.40GHz. (westmere)
> I've read gcc source, find M_INTEL_COREI7_WESTMERE, it seems
> to say westmere is corei7, but the following code doesn't work:
> 
>    #if defined(__corei7)
>        int i;
>        for (i = 0; i < 10000000; i++)
>            __builtin_popcount(i);
>    #endif
> 

You need to tell gcc to compile for your processor:

$ echo | gcc -dM -E - | grep core
$ echo | gcc -march=native -dM -E - | grep core
#define __corei7 1
#define __tune_corei7__ 1
#define __corei7__ 1
$ 

Also, you need to be careful to both allow the compiler to optimize as we do 
with building OVS (-O2), but make sure the test cases are not optimized away.

> I believe there're some particuler cpus which the buildin_popcount
> is suitable for, any way to represent them?
> 

I think it is trial and error, since the builtin popcount is kind of bad 
without direct CPU support. 

> On 06/12/2013 12:26, Ben Pfaff wrote:
>> 
>> But I'm inclined to believe that a 65536-byte array wastes too much
>> memory.
>> 

I’m inclined to agree that it might waste too much (L1 cache) memory.

   Jarno
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to