On 06/12/2013 10:39, Alexander Wu wrote:
On 06/12/2013 01:47, Ben Pfaff wrote:
On Wed, Dec 04, 2013 at 05:38:29PM +0800, Alexander Wu wrote:
V4:
Add bitmap_count1 function to count all 1 bits.
I suggest using the count_1bits() function to speed this up.
Thanks,
Ben.
Hi Ben,
Thanks for your suggestion! I'll update the function with more
efficiently algorithm. But I found a problem.
| type | length
----------------------------------------------
bitmap_* | unsigned long * | L32
count1_bits | uint64_t | 64
* L means Least
Do you have suggestion about how to use count1_bits with unsigned
long safely?
BTW, the algorithm hamming weight looks good:
http://en.wikipedia.org/wiki/Hamming_weight
Best regards,
Alexander Wu
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev
Some additional test for bitmap performance (10 million loops):
suse-kvm-of13:/test # time ./bit3
real 0m0.061s
user 0m0.060s
sys 0m0.000s
suse-kvm-of13:/test # time ./bit2
real 0m0.088s
user 0m0.084s
sys 0m0.000s
suse-kvm-of13:/test # time ./bit1
real 0m0.080s
user 0m0.072s
sys 0m0.004s
[bit1]: count1_bits
[bit2]: NumberOfSetBits from http://stackoverflow.com/a/109025
[bit3]: precomputed16_bitcount from
http://gurmeet.net/puzzles/fast-bit-counting-routines/bitcount.c
After reading the implements, I understand how precomputed16_bitcount wins.
count_1bits_32 inits 256 bytes array, while precomputed16_bitcount
inits 65536 bytes array and reduce the calculation:
count1_bits:
return (count_1bits_8[x & 0xff] +
count_1bits_8[(x >> 8) & 0xff] +
count_1bits_8[(x >> 16) & 0xff] +
count_1bits_8[x >> 24]);
precomputed16_bitcount:
return bits_in_16bits [n & 0xffffu]
+ bits_in_16bits [(n >> 16) & 0xffffu] ;
So how about change to the fastest one?
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev