https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66511
--- Comment #2 from Matthijs Kooijman <matthijs at stdin dot nl> --- So, IIUC, this is quite hard to fix? Either you use lib functions, which prevents the optimizer from just relabeling or coyping registers to apply shifting, or you don't and then more complex operations will become very verbose and messy? Would it make sense (and be possible) to add a special case to not use lib functions for shifts by a constant number of bits that is also a multiple of 8? At first glance, that would make a lot of common cases (where an integer is decomposed into separate bytes or other parts) a lot faster, while still keeping the lib functions for more complex operations?