> From: Richard Biener [mailto:richard.guent...@gmail.com] > > It may do three aligned loads, char, short, char and combine them > while doing an unaligned int load may end up being slower. Though > very probable the RTL expansion machinery for unaligned loads > is way more clever to emit an optimal sequence than a programmer is.
That's what I meant. I expect the RTL machinery to emit the optimal sequence to do an unaligned load. If it doesn't, it should probably be fixed. > > Anyway, as said before please consider addressing any cost issues > as followup - just make sure to properly emit unaligned loads via > a sequence I suggested. That's what I did with the addition of skipping the optimization for target with slow unaligned access for bswap permutation. Since this affects ARM you can be sure I'll follow up on the cost model. I already started thinking about it. I'll send the new patch as soon as all the tests are done. Best regards, Thomas