On Wed, Oct 2, 2024 at 8:26 PM Jonathan Wakely <jwak...@redhat.com> wrote: > > On Wed, 2 Oct 2024 at 19:16, Jonathan Wakely <jwak...@redhat.com> wrote: > > > > On Wed, 2 Oct 2024 at 19:15, Dmitry Ilvokhin <d...@ilvokhin.com> wrote: > > > > > > Instead of looping over every byte of the tail, unroll loop manually > > > using switch statement, then compilers (at least GCC and Clang) will > > > generate a jump table [1], which is faster on a microbenchmark [2]. > > > > > > [1]: https://godbolt.org/z/aE8Mq3j5G > > > [2]: https://quick-bench.com/q/ylYLW2R22AZKRvameYYtbYxag24 > > > > > > libstdc++-v3/ChangeLog: > > > > > > * libstdc++-v3/libsupc++/hash_bytes.cc (load_bytes): unroll > > > loop using switch statement. > > > > > > Signed-off-by: Dmitry Ilvokhin <d...@ilvokhin.com> > > > --- > > > libstdc++-v3/libsupc++/hash_bytes.cc | 27 +++++++++++++++++++++++---- > > > 1 file changed, 23 insertions(+), 4 deletions(-) > > > > > > diff --git a/libstdc++-v3/libsupc++/hash_bytes.cc > > > b/libstdc++-v3/libsupc++/hash_bytes.cc > > > index 3665375096a..294a7323dd0 100644 > > > --- a/libstdc++-v3/libsupc++/hash_bytes.cc > > > +++ b/libstdc++-v3/libsupc++/hash_bytes.cc > > > @@ -50,10 +50,29 @@ namespace > > > load_bytes(const char* p, int n) > > > { > > > std::size_t result = 0; > > > - --n; > > > - do > > > - result = (result << 8) + static_cast<unsigned char>(p[n]); > > > - while (--n >= 0); > > > > Don't we still need to loop, for the case where n >= 8? Otherwise we > > only hash the first 8 bytes. > > Ah, but it's only ever called with load_bytes(end, len & 0x7)
The compiler should do such transforms - you probably want to tell it that n < 8 though, it likely doesn't (always) know. > > > > > > > + switch(n & 7) > > > + { > > > + case 7: > > > + result |= std::size_t(p[6]) << 48; > > > + [[gnu::fallthrough]]; > > > + case 6: > > > + result |= std::size_t(p[5]) << 40; > > > + [[gnu::fallthrough]]; > > > + case 5: > > > + result |= std::size_t(p[4]) << 32; > > > + [[gnu::fallthrough]]; > > > + case 4: > > > + result |= std::size_t(p[3]) << 24; > > > + [[gnu::fallthrough]]; > > > + case 3: > > > + result |= std::size_t(p[2]) << 16; > > > + [[gnu::fallthrough]]; > > > + case 2: > > > + result |= std::size_t(p[1]) << 8; > > > + [[gnu::fallthrough]]; > > > + case 1: > > > + result |= std::size_t(p[0]); > > > + }; > > > return result; > > > } > > > > > > -- > > > 2.43.5 > > > >