On Tue, Jul 03, 2018 at 10:02:47PM +0100, Jonathan Wakely wrote:
> +#ifndef _GLIBCXX_BIT
> +#define _GLIBCXX_BIT 1
> +
> +#pragma GCC system_header
> +
> +#if __cplusplus >= 201402L
> +
> +#include <type_traits>
> +#include <limits>
> +
> +namespace std _GLIBCXX_VISIBILITY(default)
> +{
> +_GLIBCXX_BEGIN_NAMESPACE_VERSION
> +
> + template<typename _Tp>
> + constexpr _Tp
> + __rotl(_Tp __x, unsigned int __s) noexcept
> + {
> + constexpr auto _Nd = numeric_limits<_Tp>::digits;
> + const unsigned __sN = __s % _Nd;
> + if (__sN)
> + return (__x << __sN) | (__x >> (_Nd - __sN));
Wouldn't it be better to use some branchless pattern that
GCC can also optimize well, like:
return (__x << __sN) | (__x >> ((-_sN) & (_Nd - 1)));
(iff _Nd is always power of two), or perhaps
return (__x << __sN) | (__x >> ((-_sN) % _Nd));
which is going to be folded into the above one for power of two constants?
E.g. ia32intrin.h also uses:
/* 64bit rol */
extern __inline unsigned long long
__attribute__((__gnu_inline__, __always_inline__, __artificial__))
__rolq (unsigned long long __X, int __C)
{
__C &= 63;
return (__X << __C) | (__X >> (-__C & 63));
}
etc.
Jakub