https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117718
Bug ID: 117718 Summary: Inefficient address computation for d-form vector loads Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: bergner at gcc dot gnu.org Target Milestone: --- If we compile some simple test cases returning the value from a global vector array, we fail to fold the low 16-bits of the offset into the (new to power9) lxv's offset and instead do the full offset computation outside of the load and then use an offset of zero for the lxv. bergner@c643n10lp1:~$ cat vectorlong.c #include <altivec.h> vector long var[16]; vector long foo (void) { return var[0]; } vector long bar (void) { return var[1]; } bergner@c643n10lp1:~$ gcc -S -O2 -mcpu=power9 vectorlong.c bergner@c643n10lp1:~$ cat vectorlong.s foo: [snip toc setup] addis 9,2,.LANCHOR0@toc@ha addi 9,9,.LANCHOR0@toc@l lxv 34,0(9) blr bar: [snip toc setup] addis 9,2,.LANCHOR0@toc@ha addi 9,9,.LANCHOR0@toc@l lxv 34,16(9) blr However, for an equivalent test case using scalars (integer or fp), we do fold the offset into the load, reducing the number of instructions from three to two: bergner@c643n10lp1:~$ cat long.c long var[16]; long foo (void) { return var[0]; } long bar (void) { return var[1]; } bergner@c643n10lp1:~$ gcc -S -O2 -mcpu=power9 long.c bergner@c643n10lp1:~$ cat long.s foo: [snip toc setup] addis 9,2,.LANCHOR0@toc@ha ld 3,.LANCHOR0@toc@l(9) blr bar: [snip toc setup] addis 9,2,.LANCHOR0+8@toc@ha ld 3,.LANCHOR0+8@toc@l(9) blr We should perform the same optimization for vector loads/stores as we do for scalar loads/stores.