https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116560
--- Comment #2 from Jeffrey A. Law <law at gcc dot gnu.org> --- So at first glance it appears the bswap pass is kicking in and trying to improve this code. Essentially it wants to do a single 16bit load out of memory, rotate right by 8 bits and store. All that seems pretty reasonable. So the gimple optimizers are doing their job correctly. But gimple->rtl expansion just looks terrible. Instead of a single load we still have two loads, so much of the gain expected from the transformation is ultimately lost. Things just go downhill from there. Not working on this, but just wanted to record observations.