On Fri, Nov 15, 2013 at 03:48:58PM -0800, Ben Pfaff wrote: > On Wed, Nov 13, 2013 at 01:32:42PM -0800, Jarno Rajahalme wrote: > > Add raw_ctz64(), ctz64(), and popcount64() using builtins when > > available. > > > > I'm not sure if the "UINT64_MAX == ~0UL" and "UINT64_MAX == ~0ULL" > > work in all cases as I imagine they would. > > I think you could use ULONG_MAX and ULLONG_MAX, any reason not to? > > > Signed-off-by: Jarno Rajahalme <jrajaha...@nicira.com>
I had another thought. It is really convenient how rightmost_1bit() and zero_rightmost_1bit() work with any width integer. It would be nice if we could just make ctz() and raw_ctz() do the same. I was able to make rightmost_1bit() and zero_rightmost_1bit() work that way because when I did some tests with GCC, I found that GCC was smart enough to compile the code with the cheapest instructions. That is, if you pass in a uint32_t it didn't bother to do 64-bit arithmetic. I see that GCC isn't smart enough to do that with __builtin_ctzll(), but you can fake it with a small amount of manual effort. That is, the following C file: #include <stdint.h> static inline int ctz(unsigned long long int x) { if (__builtin_constant_p(x <= UINT32_MAX) && x <= UINT32_MAX) { return __builtin_ctz(x); } else { return __builtin_ctzll(x); } } int ctzl(unsigned long int x) { return ctz(x); } int ctzll(unsigned long long int x) { return ctz(x); } compiles to this on 32-bit: 00000000 <ctzl>: 0: 0f bc 44 24 04 bsf 0x4(%esp),%eax 5: c3 ret 6: 8d 76 00 lea 0x0(%esi),%esi 9: 8d bc 27 00 00 00 00 lea 0x0(%edi,%eiz,1),%edi 00000010 <ctzll>: 10: 83 ec 1c sub $0x1c,%esp 13: 8b 44 24 20 mov 0x20(%esp),%eax 17: 8b 54 24 24 mov 0x24(%esp),%edx 1b: 89 04 24 mov %eax,(%esp) 1e: 89 54 24 04 mov %edx,0x4(%esp) 22: e8 fc ff ff ff call 23 <ctzll+0x13> 23: R_386_PC32 __ctzdi2 27: 83 c4 1c add $0x1c,%esp 2a: c3 ret and to this on 64-bit: 0000000000000000 <ctzl>: 0: 48 0f bc c7 bsf %rdi,%rax 4: c3 retq 5: 66 66 2e 0f 1f 84 00 data32 nopw %cs:0x0(%rax,%rax,1) c: 00 00 00 00 0000000000000010 <ctzll>: 10: 48 0f bc c7 bsf %rdi,%rax 14: c3 retq which is just about perfect. What do you think? Thanks, Ben. _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev