https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86209
--- Comment #7 from ktkachov at gcc dot gnu.org --- The other thing to consider with merging loads is how the result is used. In your example if you merge the 16-bit loads into a single 32-bit register load you'll have to add instructions to extract the low and high parts into separate registers in order to add them together and that can end up be more expensive overall.