https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64036

--- Comment #4 from Oleg Endo <olegendo at gcc dot gnu.org> ---
(In reply to Oleg Endo from comment #2)
> An example function, compiling with -O2 -m4:
> 
> int test_0 (unsigned short* x, int y, int z)
> {
>   return 
>      (x[0] + x[1] + x[2] + x[3] + x[4] + x[5] + x[6]
>            + x[7] + x[8] + x[9] + x[10]) ? y : z;
> }
> 
> Without sched1, there are lots of dependencies on the results of memory
> loads.
> 
> With sched1, there is generally more code generated and variable live ranges
> are longer.  The above function will use r8 and r9, which is not really
> necessary.  Memory load dependencies are reduced and more LS/EX/MT
> instructions can be executed in parallel.  Code size for the test function
> increases from ~37 insns to ~50 insns.  Approximated cycles on SH4 pipeline
> should be ~37 cycles without sched1 and ~33 cycles with sched1.  On SH4A the
> latency of a load is 1 cycle, so without sched1 it should be ~28 cycles.

BTW, with AMS there is no difference of sched1 or no-sched1 in code size,
because it uses post-inc loads.  With AMS + sched1 the example above compiles
to 38 insns and there are no dependencies/stalls on the mem loads anymore.  The
r8,r9 usage issue is also gone.

Reply via email to