On 7/6/16 12:53 PM, David Edelsohn wrote: > On Tue, Jul 5, 2016 at 10:26 PM, Peter Bergner <berg...@vnet.ibm.com> wrote: >> The following patch fixes a bug where we do not disable POWER9 vector dform >> addressing when we compile for POWER9 but without VSX support. This >> manifested >> itself with us trying to use dform addressing with altivec loads/stores >> which is illegal, leading to an ICE. > > Peter, > > DFORM definitely should be disabled without VSX, but the patch seems > incomplete. If VSX and DFORM are enabled, and GCC chooses an Altivec > instruction alternative in a pattern, what is to prevent the > generation of a DFORM address?
Adding Uli and Alan to the CC since there are some reload questions. :-) So I dug into this a little more. With -mcpu=power9 (ie, no -mno-vsx), I could not get an altivec pattern generated, without resorting to using an altivec builtin. The altivec builtins also seem to force reg+reg addressing, since hey, that's all they support. So I couldn't create a test case to answer your question of what if VSX and DFORM are enabled and we have an altivec pattern with a dform address. Maybe there should be a test case that generates one of the altivec load/store patterns, but I couldn't do it. To try and answer your question, I went back to using -mcpu=power9 -mno-vsx with unpatched trunk (ie, without my patch to disable vector dform when power9 vector is disabled). Looking at the following simple load test case, it seems reload is capable of fixing up dform addresses when they're used in an altivec pattern: bergner@genoa:~/gcc/BUGS/sawdey/altivec$ cat vec_load.i typedef __attribute__((altivec(vector__))) __attribute__((aligned(16))) unsigned char vec_t; vec_t foo (vec_t *dst) { return dst[1]; } bergner@genoa:~/gcc/BUGS/sawdey/altivec$ /home/bergner/gcc/build/gcc-fsf-mainline-debug-3/gcc/xgcc -B/home/bergner/gcc/build/gcc-fsf-mainline-debug-3/gcc -S -O1 -mcpu=power9 -mno-vsx vec_load.i -fdump-rtl-all-all This gives us the following just before reload: (insn 11 6 12 2 (set (reg/i:V16QI 79 2) (mem:V16QI (plus:DI (reg:DI 3 3 [ dstD.2412 ]) (const_int 16 [0x10])) [0 MEM[(vec_tD.2411 *)dst_2(D) + 16B]+0 S16 A128])) vec_load.i:6 1209 {*altivec_movv16qi} (expr_list:REG_DEAD (reg:DI 3 3 [ dstD.2412 ]) (nil))) During reload, we see a reload for input: Spilling for insn 11. Reloads for insn # 11 Reload 0: reload_in (DI) = (plus:DI (reg:DI 3 3 [ dstD.2412 ]) (const_int 16 [0x10])) BASE_REGS, RELOAD_FOR_INPUT (opnum = 1), inc by 16 reload_in_reg: (plus:DI (reg:DI 3 3 [ dstD.2412 ]) (const_int 16 [0x10])) reload_reg_rtx: (reg:DI 3 3) Insns emitted for these reloads: (insn 18 16 11 2 (set (reg:DI 3 3) (plus:DI (reg:DI 3 3 [ dstD.2412 ]) (const_int 16 [0x10]))) vec_load.i:6 75 {*adddi3} (nil)) ...which fixes the address so it satisfies its constraints and leaves us with the rtl we want: (insn 18 6 11 2 (set (reg:DI 3 3) (plus:DI (reg:DI 3 3 [ dstD.2412 ]) (const_int 16 [0x10]))) vec_load.i:6 75 {*adddi3} (nil)) (insn 11 18 12 2 (set (reg/i:V16QI 79 2) (mem:V16QI (reg:DI 3 3) [0 MEM[(vec_tD.2411 *)dst_2(D) + 16B]+0 S16 A128])) vec_load.i:6 1209 {*altivec_movv16qi} (nil)) So to answer your question, at least for the altivec load case, reload prevents us from using reg+offset addressing. The problem occurs when we have an altivec store, which is what the original test case had: bergner@genoa:~/gcc/BUGS/sawdey/altivec$ cat vec_store.i typedef __attribute__((altivec(vector__))) __attribute__((aligned(16))) unsigned char vec_t; void foo (vec_t *dst, vec_t src) { dst[1] = src; } bergner@genoa:~/gcc/BUGS/sawdey/altivec$ /home/bergner/gcc/build/gcc-fsf-mainline-debug-3/gcc/xgcc -B/home/bergner/gcc/build/gcc-fsf-mainline-debug-3/gcc -S -O1 -mcpu=power9 -mno-vsx vec_store.i -fdump-rtl-all-all This gives us the following just before reload: (insn 7 4 0 2 (set (mem:V16QI (plus:DI (reg:DI 3 3 [ dstD.2412 ]) (const_int 16 [0x10])) [0 MEM[(vec_tD.2411 *)dst_1(D) + 16B]+0 S16 A128]) (reg:V16QI 79 2 [ srcD.2413 ])) vec_store.i:5 1209 {*altivec_movv16qi} (expr_list:REG_DEAD (reg:V16QI 79 2 [ srcD.2413 ]) (expr_list:REG_DEAD (reg:DI 3 3 [ dstD.2412 ]) (nil)))) During reload, we see a reload for output: Spilling for insn 7. Using reg 9 for reload 1 Spilling for insn 7. Using reg 9 for reload 1 Reloads for insn # 7 Reload 0: reload_out (V16QI) = (mem:V16QI (plus:DI (reg:DI 3 3 [ dstD.2412 ]) (const_int 16 [0x10])) [0 MEM[(vec_tD.2411 *)dst_1(D) + 16B]+0 S16 A128]) NO_REGS, RELOAD_FOR_OUTPUT (opnum = 0), optional reload_out_reg: (mem:V16QI (plus:DI (reg:DI 3 3 [ dstD.2412 ]) (const_int 16 [0x10])) [0 MEM[(vec_tD.2411 *)dst_1(D) + 16B]+0 S16 A128]) Reload 1: reload_in (V16QI) = (reg:V16QI 79 2 [ srcD.2413 ]) GENERAL_REGS, RELOAD_FOR_INPUT (opnum = 1) reload_in_reg: (reg:V16QI 79 2 [ srcD.2413 ]) reload_reg_rtx: (reg:V16QI 9 9) Insns emitted for these reloads: (insn 12 11 13 2 (set (mem/c:V16QI (plus:DI (reg/f:DI 1 1) (const_int -16 [0xfffffffffffffff0])) [0 S16 A128]) (reg:V16QI 79 2 [ srcD.2413 ])) vec_store.i:5 -1 (nil)) (insn 13 12 7 2 (set (reg:V16QI 9 9) (mem/c:V16QI (plus:DI (reg/f:DI 1 1) (const_int -16 [0xfffffffffffffff0])) [0 S16 A128])) vec_store.i:5 -1 (nil)) The Reload 1 on the other hand is totally bogus. I'd expect to see an input reload for the reg+offset address. Instead, it reloads the src altivec reg and really creates a mess, since it's loading into a GPR and not a altivec register and it's using a reg+offset address which is wrong here. My guess is that we created the input reload for the src reg because rs6000_legitimate_address_p() and quad_addredss_p() still say that a reg+offset address for a V16QImode mode is ok, even though in this specific case, it isn't since we're targeting an altivec store pattern...but they don't know that. So reload seems to say, if the address is ok, the src reg must need reloading. Did I get that right??? So how do we fix this? It seems we need a way to disambiguate vector load/store addresses that should allow reg+offset addresses (when they're targeting one of the new power9 dform load/store patterns and when they should not allow reg+offset addresses (when they're targeting older altivec and vsx load/stores that only support reg+reg addressing). Does anyone have an idea of how we can accomplish that? I'll note that I tried to modify rs6000_secondary_reload_memory to catch the case where we are passing in an vector dform address with a regclass of ALTIVEC_REGS, but that didn't seem to help. That might be because rs6000_legitimate_address_p() and quad_addredss_p() still say the reg+offset address are ok. Do we need to add an extra argument to those to tell them to not allow reg+offset addressing when we know it's not allowed? Peter