Re: [ovs-dev] [PATCH] util: Better count_1bits().

2013-12-06 Thread Jarno Rajahalme
Pushed as-is, leaving Ben’s preference as an opportunity for future improvement. Jarno On Dec 6, 2013, at 9:24 AM, Ben Pfaff wrote: > On Thu, Dec 05, 2013 at 04:36:26PM -0800, Jarno Rajahalme wrote: >> Inline, use another well-known algorithm for 64-bit builds, and use >> builtins when they a

Re: [ovs-dev] [PATCH] util: Better count_1bits().

2013-12-06 Thread Ben Pfaff
On Thu, Dec 05, 2013 at 04:36:26PM -0800, Jarno Rajahalme wrote: > Inline, use another well-known algorithm for 64-bit builds, and use > builtins when they are known to be fast at compile time. A 32-bit > version of the alternate algorithm is slower than the existing > implementation, so the old o

Re: [ovs-dev] [PATCH] util: Better count_1bits().

2013-12-06 Thread Jarno Rajahalme
On Dec 6, 2013, at 7:36 AM, Jarno Rajahalme wrote: > On Dec 5, 2013, at 4:44 PM, Ben Pfaff wrote: >> How did you measure the benefit of inlining? > > With a standalone C program running different variants (original, inlined, > different algorithm) over I should mention that I linked with a st

Re: [ovs-dev] [PATCH] util: Better count_1bits().

2013-12-06 Thread Jarno Rajahalme
On Dec 5, 2013, at 4:44 PM, Ben Pfaff wrote: > On Thu, Dec 05, 2013 at 04:36:26PM -0800, Jarno Rajahalme wrote: >> Inline, use another well-known algorithm for 64-bit builds, and use >> builtins when they are known to be fast at compile time. A 32-bit >> version of the alternate algorithm is sl

Re: [ovs-dev] [PATCH] util: Better count_1bits().

2013-12-05 Thread Ben Pfaff
On Thu, Dec 05, 2013 at 04:36:26PM -0800, Jarno Rajahalme wrote: > Inline, use another well-known algorithm for 64-bit builds, and use > builtins when they are known to be fast at compile time. A 32-bit > version of the alternate algorithm is slower than the existing > implementation, so the old o

[ovs-dev] [PATCH] util: Better count_1bits().

2013-12-05 Thread Jarno Rajahalme
Inline, use another well-known algorithm for 64-bit builds, and use builtins when they are known to be fast at compile time. A 32-bit version of the alternate algorithm is slower than the existing implementation, so the old one is used for 32-bit builds. Inline assembler would be a bit faster on