I was looking into some bitfield code for aarch64 and was wondering why SLOW_BYTE_ACCESS is set to 0. I can't seem to figure out why though. The header says: Although there's no difference in instruction count or cycles, in AArch64 we don't want to expand to a sub-word to a 64-bit access if we don't have to, for power-saving reasons. */
But that does not make sense because with SLOW_BYTE_ACCESS to 0, GCC expands a sub-word access to a 64bit access. When I set to SLOW_BYTE_ACCESS to 1, I get between 38% to 208% speed up for accesses of a bitfields inside a loop on ThunderX CN88xx. Should we change SLOW_BYTE_ACCESS (or maybe better yet get rid of it)? Thanks, Andrew Pinski