> From: Richard Biener [mailto:richard.guent...@gmail.com]
> 
> It may do three aligned loads, char, short, char and combine them
> while doing an unaligned int load may end up being slower.  Though
> very probable the RTL expansion machinery for unaligned loads
> is way more clever to emit an optimal sequence than a programmer is.

That's what I meant. I expect the RTL machinery to emit the optimal sequence
to do an unaligned load. If it doesn't, it should probably be fixed.

> 
> Anyway, as said before please consider addressing any cost issues
> as followup - just make sure to properly emit unaligned loads via
> a sequence I suggested.

That's what I did with the addition of skipping the optimization for target
with slow unaligned access for bswap permutation. Since this affects ARM
you can be sure I'll follow up on the cost model. I already started thinking
about it.

I'll send the new patch as soon as all the tests are done.

Best regards,

Thomas



Reply via email to