On Wed, Jun 11, 2014 at 7:45 AM, Thomas Preud'homme
<thomas.preudho...@arm.com> wrote:
>> From: Richard Biener [mailto:richard.guent...@gmail.com]
>> Sent: Tuesday, June 10, 2014 5:16 PM
>>
>
>> In general this is impossible to do.  I don't have a good answer on
>> how to determine whether (unaligned) load + bswap is faster than
>> doing sth else - but there is a very good chance that the original
>> code is even worse.  For the unaligned load you can expect
>> an optimal code sequence to be generated - likewise for the bswap.
>> Now - if you want to do the best for the combination of both I'd
>> say you add support to the expr.c bitfield extraction code to do
>> the bswap on-the-fly and use TER to see that you are doing the
>> bswap on a memory source.
>
> Oh I see. Doing it there would mean instead of two independent
> operations you'd do the best combination possible, is that right?

Yes (but probably it's not worth the trouble).

>>
>> There is only two choices - disable unaligned-load + bswap on
>> SLOW_UNALIGNED_ACCESS targets or not.  Doing sth more
>> fancy won't do the trick and isn't worth the trouble IMHO.
>
> There is some other reason to compute the cost that I didn't
> mention. For instance, you suggested to recognize partial
> load (+bswap). Quoting you:
>
>> unsigned foo (unsigned char *x)
>> {
>>   return x[0] << 24 | x[2] << 8 | x[3];
>> }
>>
>> ?  We could do an unsigned int load from x and zero byte 3
>> with an AND.
>
> Even with aligned access, the above might be slower if x[0] was
> already loaded previously and sits in a register.

Well, I think the pattern detection makes sure that it only follows
single-use chains?  Or rather that all original loads and bit
operations are dead after the transform (not exactly following
single-use chains if you'd consider to eventually handle
x[0] << 24 | x[0] << 16 | x[2] << 8 | x[3] as a valid permutation)?

> I'm tempted to use a simple heuristic such as comparing the
> number of loads before and after, adding one if the load is
> unaligned. So in the above example, supposing that there is
> some computation done around x[0] before the return line,
> we'd have 2 loads before Vs 2 x is unaligned and we would
> cancel the optimization. If x is aligned the optimization would
> proceed.
>
> Do you thing this approach is also too much trouble or would
> not work?

I'm not sure.  For noop-loads I'd keep them unconditionally, even if
unaligned.  I'd disable unaligned-load + bswap for now.  People
interested and sitting on such a target should do the measurements
and decide if it's worth the trouble (is arm affected?).

But I see that the code currently does not limit itself to single-use
chains and thus may end up keeping the whole original code life
by unrelated uses.  So a good thing would be to impose proper
restrictions here.  For example, in find_bswap_or_nop_1 do

  if (TREE_CODE (rhs1) != SSA_NAME
     || !has_single_use (rhs1))

Richard.

> Best regards,
>
> Thomas
>
>
>

Reply via email to