On Dec 8, 2013, at 7:34 PM, Alexander Wu <alexander...@huawei.com> wrote:

> Hi Jarno,
> 
> I get my gcc predefined __core2. But its performance seems to be worse when
> I add '-O2'. Not sure if it's the reality.
> 

>From the numbers below it seems that performance is better with -O2 (1063893 < 
>1317450), so I’m not sure what you mean here.


> Here are part of my test code, compile command and its result.
> 
> Code:
> 
>    uint32_t i, last_bits;
>    struct timespec start = {0};
>    struct timespec end = {0};
>    srand(time(NULL));
>    int r = rand();
> #define N_LOOP 100000
>    int random_array[N_LOOP];
> 
>    srand(time(NULL));
>    for (i = 0; i < N_LOOP; i++) {
>        r = rand();
>        random_array[i] = r;
>    }
> 
> //__builtin_popcount
>    clock_gettime(CLOCK_THREAD_CPUTIME_ID, &start);
>    for (i = 0; i < N_LOOP; i++) {
>        last_bits = __builtin_popcount(random_array[i]);
>    }
>    clock_gettime(CLOCK_THREAD_CPUTIME_ID, &end);
>    printf("time-diff:%ld\n", end.tv_nsec - start.tv_nsec);
>    printf("last-bits:%d\n", last_bits);
> 
> //original ovs count_1bits_32
>    clock_gettime(CLOCK_THREAD_CPUTIME_ID, &start);
>    for (i = 0; i < N_LOOP; i++) {
>        last_bits = count_1bits_32(random_array[i]);
>    }
>    clock_gettime(CLOCK_THREAD_CPUTIME_ID, &end);
>    printf("time-diff:%ld\n", end.tv_nsec - start.tv_nsec);
>    printf("last-bits:%d\n", last_bits);
> 
> //simple foo function, to count '=' and function time.
>    clock_gettime(CLOCK_THREAD_CPUTIME_ID, &start);
>    for (i = 0; i < N_LOOP; i++) {
>        last_bits = foo();
>    }
>    clock_gettime(CLOCK_THREAD_CPUTIME_ID, &end);
>    printf("time-diff:%ld\n", end.tv_nsec - start.tv_nsec);
>    printf("last-bits:%d\n", last_bits);
> 
> Compile:
>    gcc bit1.c -o bit1 -march=native -mtune=native -lrt -O2  && ./bit1
> 
> Result:
> 
>    time-diff:1063893 //__builtin_popcount
>    last-bits:10
>    time-diff:293463  //original ovs count_1bits_32
>    last-bits:10
>    time-diff:188     //simple foo function, to count '=' and function 
> time.(maybe it has been optimized out)
>    last-bits:99999
> 
> Result without -O2:
> 
>    time-diff:1317450
>    last-bits:10
>    time-diff:991438
>    last-bits:10
>    time-diff:416265
>    last-bits:99999
> 
> 
> Note I use last_bits to restore the return value, and when I use it,
> performance of __builtin_popcount seems to decrease, I guess compiler
> optimize __builtin_popcount as its wish like -O2.

You could prevent optimizations by adding instead of simply assigning, (i.e., 
“last_bits += …”).

> 
> So do you think it's enough to represent __builtin_popcount is not
> suitable for __core2?
> 

Seems so, and it also makes sense as Core2 does not have the popcnt instruction.

  Jarno

_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to