https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67441
Bug ID: 67441 Summary: Scheduler unable to disambiguate memory references in unrolled loop Product: gcc Version: 6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: pthaugen at gcc dot gnu.org CC: bergner at gcc dot gnu.org, dje at gcc dot gnu.org, wschmidt at gcc dot gnu.org Target Milestone: --- Host: powerpc64-unknown-linux-gnu Target: powerpc64-unknown-linux-gnu Build: powerpc64-unknown-linux-gnu The following shows an example where the scheduler is unable to disambiguate memory references inside the unrolled loop, which prevents any motion of the loads above the (non-overlapping) preceding stores. pthaugen@genoa:~/temp/unroll-alias$ cat junk.c #define SIZE 1024 double x[SIZE] __attribute__ ((aligned (16))); void do_one(void) { unsigned long i; for (i = 0; i < SIZE; i++) x[i] = x[i] + 1.0; } pthaugen@genoa:~/temp/unroll-alias$ ~/install/gcc/trunk/bin/gcc -O3 -funroll-loops -S junk.c -mcpu=power8 Following is generated, which shows the loop unrolled, but no movement of the loads/adds, so we basically have back to back copies of the loop body. .L2: lxvd2x 12,0,9 addi 4,9,16 addi 11,9,32 addi 5,9,48 addi 6,9,64 addi 7,9,80 addi 8,9,96 addi 12,9,112 xvadddp 1,12,0 stxvd2x 1,0,9 addi 9,9,128 lxvd2x 2,0,4 xvadddp 3,2,0 stxvd2x 3,0,4 lxvd2x 4,0,11 xvadddp 5,4,0 stxvd2x 5,0,11 lxvd2x 6,0,5 xvadddp 7,6,0 stxvd2x 7,0,5 lxvd2x 8,0,6 xvadddp 9,8,0 stxvd2x 9,0,6 lxvd2x 10,0,7 xvadddp 11,10,0 stxvd2x 11,0,7 lxvd2x 13,0,8 xvadddp 12,13,0 stxvd2x 12,0,8 lxvd2x 1,0,12 xvadddp 2,1,0 stxvd2x 2,0,12 bdnz .L2 An example store/load sequence looks like the following at sched1 timeframe, where r193 coming in was set to r170+64. (insn 81 80 82 3 (set (mem:V2DF (reg:DI 193 [ ivtmp.14 ]) [1 MEM[base: _7, offset: 0B]+0 S16 A128]) (reg:V2DF 196 [ vect__5.6 ])) junk.c:12 886 {*vsx_movv2df} (expr_list:REG_DEAD (reg:V2DF 196 [ vect__5.6 ]) (expr_list:REG_DEAD (reg:DI 193 [ ivtmp.14 ]) (nil)))) (insn 82 81 90 3 (set (reg:DI 197 [ ivtmp.14 ]) (plus:DI (reg:DI 170 [ ivtmp.14 ]) (const_int 80 [0x50]))) 81 {*adddi3} (nil)) (insn 90 82 91 3 (set (reg:V2DF 199 [ MEM[base: _7, offset: 0B] ]) (mem:V2DF (reg:DI 197 [ ivtmp.14 ]) [1 MEM[base: _7, offset: 0B]+0 S16 A128])) junk.c:12 886 {*vsx_movv2df} (nil)) The str/ld use different base regs, and the fact that they're both based off r170+displ is lost when we're just looking at the two mem refs during the sched-deps code. So it falls back to the tree aliasing oracle where they both have the same MEM expr with offset 0 so are not disambiguated. Not sure if unroller should be creating new tree MEM expr with appropriate offsets so the mem's can be seen as not overlapping or if sched-deps code needs to be enhanced to try and incorporate the base reg increment so that the rtl base/displ is clearly seen and can be disambiguated that way.