http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49473

--- Comment #3 from philb at gnu dot org 2011-08-03 10:38:28 UTC ---
(In reply to comment #2)
> This looks like it might be to do with the latency of the call instruction at
> least for the LPIC0 case. The scheduler thinks that r0 isn't ready really till
> cycle 34 or so and hence the compiler can't hoist the mov r5, r0 above the add
> r4, pc, r4 . 

That seems rather peculiar.  The worst case behaviour that the called function
is likely to have would be something like:

ldr r0, [r1]
bx lr

It's possible that the ldr might have a result latency of up to four cycles (if
it were an ARM1136 unaligned access), but the bx will take a minimum of four
cycles even if it was correctly predicted by the return stack and hence the
result latency of the ldr will effectively be annulled.  So, as far as the
scheduler is concerned, it seems as though the result latency of the call
instruction should be considered to be one.

Reply via email to