On Wed, Jun 11, 2014 at 7:45 AM, Thomas Preud'homme <thomas.preudho...@arm.com> wrote: >> From: Richard Biener [mailto:richard.guent...@gmail.com] >> Sent: Tuesday, June 10, 2014 5:16 PM >> > >> In general this is impossible to do. I don't have a good answer on >> how to determine whether (unaligned) load + bswap is faster than >> doing sth else - but there is a very good chance that the original >> code is even worse. For the unaligned load you can expect >> an optimal code sequence to be generated - likewise for the bswap. >> Now - if you want to do the best for the combination of both I'd >> say you add support to the expr.c bitfield extraction code to do >> the bswap on-the-fly and use TER to see that you are doing the >> bswap on a memory source. > > Oh I see. Doing it there would mean instead of two independent > operations you'd do the best combination possible, is that right?
Yes (but probably it's not worth the trouble). >> >> There is only two choices - disable unaligned-load + bswap on >> SLOW_UNALIGNED_ACCESS targets or not. Doing sth more >> fancy won't do the trick and isn't worth the trouble IMHO. > > There is some other reason to compute the cost that I didn't > mention. For instance, you suggested to recognize partial > load (+bswap). Quoting you: > >> unsigned foo (unsigned char *x) >> { >> return x[0] << 24 | x[2] << 8 | x[3]; >> } >> >> ? We could do an unsigned int load from x and zero byte 3 >> with an AND. > > Even with aligned access, the above might be slower if x[0] was > already loaded previously and sits in a register. Well, I think the pattern detection makes sure that it only follows single-use chains? Or rather that all original loads and bit operations are dead after the transform (not exactly following single-use chains if you'd consider to eventually handle x[0] << 24 | x[0] << 16 | x[2] << 8 | x[3] as a valid permutation)? > I'm tempted to use a simple heuristic such as comparing the > number of loads before and after, adding one if the load is > unaligned. So in the above example, supposing that there is > some computation done around x[0] before the return line, > we'd have 2 loads before Vs 2 x is unaligned and we would > cancel the optimization. If x is aligned the optimization would > proceed. > > Do you thing this approach is also too much trouble or would > not work? I'm not sure. For noop-loads I'd keep them unconditionally, even if unaligned. I'd disable unaligned-load + bswap for now. People interested and sitting on such a target should do the measurements and decide if it's worth the trouble (is arm affected?). But I see that the code currently does not limit itself to single-use chains and thus may end up keeping the whole original code life by unrelated uses. So a good thing would be to impose proper restrictions here. For example, in find_bswap_or_nop_1 do if (TREE_CODE (rhs1) != SSA_NAME || !has_single_use (rhs1)) Richard. > Best regards, > > Thomas > > >