> From: Richard Biener [mailto:richard.guent...@gmail.com] > Sent: Tuesday, June 10, 2014 5:16 PM >
> In general this is impossible to do. I don't have a good answer on > how to determine whether (unaligned) load + bswap is faster than > doing sth else - but there is a very good chance that the original > code is even worse. For the unaligned load you can expect > an optimal code sequence to be generated - likewise for the bswap. > Now - if you want to do the best for the combination of both I'd > say you add support to the expr.c bitfield extraction code to do > the bswap on-the-fly and use TER to see that you are doing the > bswap on a memory source. Oh I see. Doing it there would mean instead of two independent operations you'd do the best combination possible, is that right? > > There is only two choices - disable unaligned-load + bswap on > SLOW_UNALIGNED_ACCESS targets or not. Doing sth more > fancy won't do the trick and isn't worth the trouble IMHO. There is some other reason to compute the cost that I didn't mention. For instance, you suggested to recognize partial load (+bswap). Quoting you: > unsigned foo (unsigned char *x) > { > return x[0] << 24 | x[2] << 8 | x[3]; > } > > ? We could do an unsigned int load from x and zero byte 3 > with an AND. Even with aligned access, the above might be slower if x[0] was already loaded previously and sits in a register. I'm tempted to use a simple heuristic such as comparing the number of loads before and after, adding one if the load is unaligned. So in the above example, supposing that there is some computation done around x[0] before the return line, we'd have 2 loads before Vs 2 x is unaligned and we would cancel the optimization. If x is aligned the optimization would proceed. Do you thing this approach is also too much trouble or would not work? Best regards, Thomas