On 07/12/2013 00:13, Jarno Rajahalme wrote:

On Dec 6, 2013, at 1:18 AM, Alexander Wu <alexander...@huawei.com> wrote:

Hi Jarno,

I've read your patch "better count1_bits", and I test the gcc
builtins separately.

Call __builtin_popcount|__builtin_popcountl|__builtin_popcountll 10 million 
times
--------------------------------------
    suse-kvm-of13:/test # time ./bit4

    real    0m0.034s
    user    0m0.032s
    sys     0m0.000s

Call count1_bits 10 million times
--------------------------------------
    suse-kvm-of13:/test # time ./bit1

    real    0m0.080s
    user    0m0.076s
    sys     0m0.000s

Looks good, but I've a problem below.
My cpuinfo: 16U * Intel(R) Xeon(R) CPU E5620 @ 2.40GHz. (westmere)
I've read gcc source, find M_INTEL_COREI7_WESTMERE, it seems
to say westmere is corei7, but the following code doesn't work:

    #if defined(__corei7)
        int i;
        for (i = 0; i < 10000000; i++)
            __builtin_popcount(i);
    #endif


You need to tell gcc to compile for your processor:

$ echo | gcc -dM -E - | grep core
$ echo | gcc -march=native -dM -E - | grep core
#define __corei7 1
#define __tune_corei7__ 1
#define __corei7__ 1
$

Also, you need to be careful to both allow the compiler to optimize as we do 
with building OVS (-O2), but make sure the test cases are not optimized away.

I believe there're some particuler cpus which the buildin_popcount
is suitable for, any way to represent them?


I think it is trial and error, since the builtin popcount is kind of bad 
without direct CPU support.

On 06/12/2013 12:26, Ben Pfaff wrote:

But I'm inclined to believe that a 65536-byte array wastes too much
memory.


I’m inclined to agree that it might waste too much (L1 cache) memory.

    Jarno


Hi Jarno,

I get my gcc predefined __core2. But its performance seems to be worse when
I add '-O2'. Not sure if it's the reality.

Here are part of my test code, compile command and its result.

Code:

    uint32_t i, last_bits;
    struct timespec start = {0};
    struct timespec end = {0};
    srand(time(NULL));
    int r = rand();
#define N_LOOP 100000
    int random_array[N_LOOP];

    srand(time(NULL));
    for (i = 0; i < N_LOOP; i++) {
        r = rand();
        random_array[i] = r;
    }

//__builtin_popcount
    clock_gettime(CLOCK_THREAD_CPUTIME_ID, &start);
    for (i = 0; i < N_LOOP; i++) {
        last_bits = __builtin_popcount(random_array[i]);
    }
    clock_gettime(CLOCK_THREAD_CPUTIME_ID, &end);
    printf("time-diff:%ld\n", end.tv_nsec - start.tv_nsec);
    printf("last-bits:%d\n", last_bits);

//original ovs count_1bits_32
    clock_gettime(CLOCK_THREAD_CPUTIME_ID, &start);
    for (i = 0; i < N_LOOP; i++) {
        last_bits = count_1bits_32(random_array[i]);
    }
    clock_gettime(CLOCK_THREAD_CPUTIME_ID, &end);
    printf("time-diff:%ld\n", end.tv_nsec - start.tv_nsec);
    printf("last-bits:%d\n", last_bits);

//simple foo function, to count '=' and function time.
    clock_gettime(CLOCK_THREAD_CPUTIME_ID, &start);
    for (i = 0; i < N_LOOP; i++) {
        last_bits = foo();
    }
    clock_gettime(CLOCK_THREAD_CPUTIME_ID, &end);
    printf("time-diff:%ld\n", end.tv_nsec - start.tv_nsec);
    printf("last-bits:%d\n", last_bits);

Compile:
    gcc bit1.c -o bit1 -march=native -mtune=native -lrt -O2  && ./bit1

Result:

    time-diff:1063893 //__builtin_popcount
    last-bits:10
    time-diff:293463  //original ovs count_1bits_32
    last-bits:10
    time-diff:188     //simple foo function, to count '=' and function 
time.(maybe it has been optimized out)
    last-bits:99999

Result without -O2:

    time-diff:1317450
    last-bits:10
    time-diff:991438
    last-bits:10
    time-diff:416265
    last-bits:99999


Note I use last_bits to restore the return value, and when I use it,
performance of __builtin_popcount seems to decrease, I guess compiler
optimize __builtin_popcount as its wish like -O2.

So do you think it's enough to represent __builtin_popcount is not
suitable for __core2?

Best regards,
Alexander Wu


_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to