On Tue, Jul 03, 2018 at 11:24:00PM +0100, Jonathan Wakely wrote: > > Wouldn't it be better to use some branchless pattern that > > GCC can also optimize well, like: > > return (__x << __sN) | (__x >> ((-_sN) & (_Nd - 1))); > > (iff _Nd is always power of two), > > _Nd is 20 for one of the INT_N types on msp340, but we could have a > special case for the rare integer types with unusual sizes. > > > or perhaps > > return (__x << __sN) | (__x >> ((-_sN) % _Nd)); > > which is going to be folded into the above one for power of two constants? > > That looks good.
Unfortunately it is not correct if _Nd is not power of two. E.g. for __sN 1, -1U % 20 is 15, not 19. So it would need to be return (__x << __sN) | (__x >> ((_Nd - __sN) % _Nd)); Unfortunately, our rotate pattern recognizer handles return (__x << __sN) | (__x >> ((-__sN) % _Nd)); or return (__x << __sN) | (__x >> ((-__sN) & (_Nd - 1))); but doesn't handle the _Nd - __sN case. Is this C++17+ only? Then perhaps if constexpr ((_Nd & (_Nd - 1)) == 0) return (__x << __sN) | (__x >> (-__sN & (_Nd - 1))); return (__x << __sN) | (__x >> ((_Nd - __sN) % _Nd)); Verify that on x86_64 for all the unsigned {char,short,int,long long} you actually get a mere rol? instruction with perhaps some register movement, but no masking, nor shifts etc. > > E.g. ia32intrin.h also uses: > > /* 64bit rol */ > > extern __inline unsigned long long > > __attribute__((__gnu_inline__, __always_inline__, __artificial__)) > > __rolq (unsigned long long __X, int __C) > > { > > __C &= 63; > > return (__X << __C) | (__X >> (-__C & 63)); > > } > > etc. > > Should we delegate to those intrinsics for x86, so that > __builtin_ia32_rolqi and __builtin_ia32_rolhi can be used when > relevant? No, the pattern recognizers should handle (for power of two bitcounts) even the char/short cases. Those intrinsics predate the improvements in rotate pattern recognition. Jakub