David Laight wrote: > Indeed, either: > ((b * 0x80200802ull) & 0x0884422110ull) * 0x0101010101ull >> 32 > or > ((b * 0x0802u & 0x22110u) | (b * 0x8020u & 0x88440u)) * 0x10101u >> 16 > are probably best - probably the 2nd since it avoids a 64x64 multiply.
A nice solution. I like it. Unfortunately the second method doesn't work for me. It reverses both nibbles of a byte seperately instead of the whole byte. E.g. from 0x01 it makes 0x08. Once it works it should probably be in libkern. > It is also worth allowing for cpus that can have a hardware instruction > (and then do it in 1 clock!) Indeed some CPUs can do that. I know that the ColdFire has a bitrev instruction. Such code can go into libkern/arch. -- Frank Wille
