On Wed, 2 Oct 2024 at 19:15, Dmitry Ilvokhin <d...@ilvokhin.com> wrote:
>
> Instead of looping over every byte of the tail, unroll loop manually
> using switch statement, then compilers (at least GCC and Clang) will
> generate a jump table [1], which is faster on a microbenchmark [2].
>
> [1]: https://godbolt.org/z/aE8Mq3j5G
> [2]: https://quick-bench.com/q/ylYLW2R22AZKRvameYYtbYxag24
>
> libstdc++-v3/ChangeLog:
>
>         * libstdc++-v3/libsupc++/hash_bytes.cc (load_bytes): unroll
>           loop using switch statement.
>
> Signed-off-by: Dmitry Ilvokhin <d...@ilvokhin.com>
> ---
>  libstdc++-v3/libsupc++/hash_bytes.cc | 27 +++++++++++++++++++++++----
>  1 file changed, 23 insertions(+), 4 deletions(-)
>
> diff --git a/libstdc++-v3/libsupc++/hash_bytes.cc 
> b/libstdc++-v3/libsupc++/hash_bytes.cc
> index 3665375096a..294a7323dd0 100644
> --- a/libstdc++-v3/libsupc++/hash_bytes.cc
> +++ b/libstdc++-v3/libsupc++/hash_bytes.cc
> @@ -50,10 +50,29 @@ namespace
>    load_bytes(const char* p, int n)
>    {
>      std::size_t result = 0;
> -    --n;
> -    do
> -      result = (result << 8) + static_cast<unsigned char>(p[n]);
> -    while (--n >= 0);

Don't we still need to loop, for the case where n >= 8? Otherwise we
only hash the first 8 bytes.

> +    switch(n & 7)
> +      {
> +      case 7:
> +       result |= std::size_t(p[6]) << 48;
> +       [[gnu::fallthrough]];
> +      case 6:
> +       result |= std::size_t(p[5]) << 40;
> +       [[gnu::fallthrough]];
> +      case 5:
> +       result |= std::size_t(p[4]) << 32;
> +       [[gnu::fallthrough]];
> +      case 4:
> +       result |= std::size_t(p[3]) << 24;
> +       [[gnu::fallthrough]];
> +      case 3:
> +       result |= std::size_t(p[2]) << 16;
> +       [[gnu::fallthrough]];
> +      case 2:
> +       result |= std::size_t(p[1]) << 8;
> +       [[gnu::fallthrough]];
> +      case 1:
> +       result |= std::size_t(p[0]);
> +      };
>      return result;
>    }
>
> --
> 2.43.5
>

Reply via email to