On Dec 6, 2013, at 1:18 AM, Alexander Wu <alexander...@huawei.com> wrote:
> Hi Jarno, > > I've read your patch "better count1_bits", and I test the gcc > builtins separately. > > Call __builtin_popcount|__builtin_popcountl|__builtin_popcountll 10 million > times > -------------------------------------- > suse-kvm-of13:/test # time ./bit4 > > real 0m0.034s > user 0m0.032s > sys 0m0.000s > > Call count1_bits 10 million times > -------------------------------------- > suse-kvm-of13:/test # time ./bit1 > > real 0m0.080s > user 0m0.076s > sys 0m0.000s > > Looks good, but I've a problem below. > My cpuinfo: 16U * Intel(R) Xeon(R) CPU E5620 @ 2.40GHz. (westmere) > I've read gcc source, find M_INTEL_COREI7_WESTMERE, it seems > to say westmere is corei7, but the following code doesn't work: > > #if defined(__corei7) > int i; > for (i = 0; i < 10000000; i++) > __builtin_popcount(i); > #endif > You need to tell gcc to compile for your processor: $ echo | gcc -dM -E - | grep core $ echo | gcc -march=native -dM -E - | grep core #define __corei7 1 #define __tune_corei7__ 1 #define __corei7__ 1 $ Also, you need to be careful to both allow the compiler to optimize as we do with building OVS (-O2), but make sure the test cases are not optimized away. > I believe there're some particuler cpus which the buildin_popcount > is suitable for, any way to represent them? > I think it is trial and error, since the builtin popcount is kind of bad without direct CPU support. > On 06/12/2013 12:26, Ben Pfaff wrote: >> >> But I'm inclined to believe that a 65536-byte array wastes too much >> memory. >> I’m inclined to agree that it might waste too much (L1 cache) memory. Jarno _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev