Re: [PATCH, rs6000] Fix PR target/71733, ICE with -mcpu=power9 -mno-vsx

Peter Bergner Sat, 09 Jul 2016 08:27:56 -0700

On 7/6/16 12:53 PM, David Edelsohn wrote:
> On Tue, Jul 5, 2016 at 10:26 PM, Peter Bergner <berg...@vnet.ibm.com> wrote:
>> The following patch fixes a bug where we do not disable POWER9 vector dform
>> addressing when we compile for POWER9 but without VSX support.  This 
>> manifested
>> itself with us trying to use dform addressing with altivec loads/stores
>> which is illegal, leading to an ICE.
> 
> Peter,
> 
> DFORM definitely should be disabled without VSX, but the patch seems
> incomplete.  If VSX and DFORM are enabled, and GCC chooses an Altivec
> instruction alternative in a pattern, what is to prevent the
> generation of a DFORM address?


Adding Uli and Alan to the CC since there are some reload questions. :-)

So I dug into this a little more.  With -mcpu=power9 (ie, no -mno-vsx),
I could not get an altivec pattern generated, without resorting to using
an altivec builtin.  The altivec builtins also seem to force reg+reg
addressing, since hey, that's all they support.  So I couldn't create
a test case to answer your question of what if VSX and DFORM are enabled
and we have an altivec pattern with a dform address.  Maybe there should
be a test case that generates one of the altivec load/store patterns,
but I couldn't do it.

To try and answer your question, I went back to using -mcpu=power9 -mno-vsx
with unpatched trunk (ie, without my patch to disable vector dform when
power9 vector is disabled).  Looking at the following simple load test
case, it seems reload is capable of fixing up dform addresses when they're
used in an altivec pattern:

bergner@genoa:~/gcc/BUGS/sawdey/altivec$ cat vec_load.i
typedef __attribute__((altivec(vector__))) __attribute__((aligned(16))) 
unsigned char vec_t;
vec_t
foo (vec_t *dst)
{
  return dst[1];
}
bergner@genoa:~/gcc/BUGS/sawdey/altivec$ 
/home/bergner/gcc/build/gcc-fsf-mainline-debug-3/gcc/xgcc 
-B/home/bergner/gcc/build/gcc-fsf-mainline-debug-3/gcc -S -O1 -mcpu=power9 
-mno-vsx vec_load.i -fdump-rtl-all-all

This gives us the following just before reload:

(insn 11 6 12 2 (set (reg/i:V16QI 79 2)
        (mem:V16QI (plus:DI (reg:DI 3 3 [ dstD.2412 ])
                (const_int 16 [0x10])) [0 MEM[(vec_tD.2411 *)dst_2(D) + 16B]+0 
S16 A128])) vec_load.i:6 1209 {*altivec_movv16qi}
     (expr_list:REG_DEAD (reg:DI 3 3 [ dstD.2412 ])
        (nil)))

During reload, we see a reload for input:

Spilling for insn 11.

Reloads for insn # 11
Reload 0: reload_in (DI) = (plus:DI (reg:DI 3 3 [ dstD.2412 ])
                                                    (const_int 16 [0x10]))
        BASE_REGS, RELOAD_FOR_INPUT (opnum = 1), inc by 16
        reload_in_reg: (plus:DI (reg:DI 3 3 [ dstD.2412 ])
                                                    (const_int 16 [0x10]))
        reload_reg_rtx: (reg:DI 3 3)
Insns emitted for these reloads:
(insn 18 16 11 2 (set (reg:DI 3 3)
        (plus:DI (reg:DI 3 3 [ dstD.2412 ])
            (const_int 16 [0x10]))) vec_load.i:6 75 {*adddi3}
     (nil))

...which fixes the address so it satisfies its constraints and leaves us
with the rtl we want:

(insn 18 6 11 2 (set (reg:DI 3 3)
        (plus:DI (reg:DI 3 3 [ dstD.2412 ])
            (const_int 16 [0x10]))) vec_load.i:6 75 {*adddi3}
     (nil))
(insn 11 18 12 2 (set (reg/i:V16QI 79 2)
        (mem:V16QI (reg:DI 3 3) [0 MEM[(vec_tD.2411 *)dst_2(D) + 16B]+0 S16 
A128])) vec_load.i:6 1209 {*altivec_movv16qi}
     (nil))

So to answer your question, at least for the altivec load case, reload prevents
us from using reg+offset addressing.


The problem occurs when we have an altivec store, which is what the original
test case had:

bergner@genoa:~/gcc/BUGS/sawdey/altivec$ cat vec_store.i
typedef __attribute__((altivec(vector__))) __attribute__((aligned(16))) 
unsigned char vec_t;
void
foo (vec_t *dst, vec_t src)
{
  dst[1] = src;
}
bergner@genoa:~/gcc/BUGS/sawdey/altivec$ 
/home/bergner/gcc/build/gcc-fsf-mainline-debug-3/gcc/xgcc 
-B/home/bergner/gcc/build/gcc-fsf-mainline-debug-3/gcc -S -O1 -mcpu=power9 
-mno-vsx vec_store.i -fdump-rtl-all-all

This gives us the following just before reload:

(insn 7 4 0 2 (set (mem:V16QI (plus:DI (reg:DI 3 3 [ dstD.2412 ])
                (const_int 16 [0x10])) [0 MEM[(vec_tD.2411 *)dst_1(D) + 16B]+0 
S16 A128])
        (reg:V16QI 79 2 [ srcD.2413 ])) vec_store.i:5 1209 {*altivec_movv16qi}
     (expr_list:REG_DEAD (reg:V16QI 79 2 [ srcD.2413 ])
        (expr_list:REG_DEAD (reg:DI 3 3 [ dstD.2412 ])
            (nil))))


During reload, we see a reload for output:

Spilling for insn 7.
Using reg 9 for reload 1
Spilling for insn 7.
Using reg 9 for reload 1

Reloads for insn # 7
Reload 0: reload_out (V16QI) = (mem:V16QI (plus:DI (reg:DI 3 3 [ dstD.2412 ])
                                                        (const_int 16 [0x10])) 
[0 MEM[(vec_tD.2411 *)dst_1(D) + 16B]+0 S16 A128])
        NO_REGS, RELOAD_FOR_OUTPUT (opnum = 0), optional
        reload_out_reg: (mem:V16QI (plus:DI (reg:DI 3 3 [ dstD.2412 ])
                                                        (const_int 16 [0x10])) 
[0 MEM[(vec_tD.2411 *)dst_1(D) + 16B]+0 S16 A128])
Reload 1: reload_in (V16QI) = (reg:V16QI 79 2 [ srcD.2413 ])
        GENERAL_REGS, RELOAD_FOR_INPUT (opnum = 1)
        reload_in_reg: (reg:V16QI 79 2 [ srcD.2413 ])
        reload_reg_rtx: (reg:V16QI 9 9)
Insns emitted for these reloads:
(insn 12 11 13 2 (set (mem/c:V16QI (plus:DI (reg/f:DI 1 1)
                (const_int -16 [0xfffffffffffffff0])) [0  S16 A128])
        (reg:V16QI 79 2 [ srcD.2413 ])) vec_store.i:5 -1
     (nil))

(insn 13 12 7 2 (set (reg:V16QI 9 9)
        (mem/c:V16QI (plus:DI (reg/f:DI 1 1)
                (const_int -16 [0xfffffffffffffff0])) [0  S16 A128])) 
vec_store.i:5 -1
     (nil))


The Reload 1 on the other hand is totally bogus.  I'd expect to see an input 
reload
for the reg+offset address.  Instead, it reloads the src altivec reg and really
creates a mess, since it's loading into a GPR and not a altivec register and
it's using a reg+offset address which is wrong here.

My guess is that we created the input reload for the src reg because
rs6000_legitimate_address_p() and quad_addredss_p() still say that a reg+offset
address for a V16QImode mode is ok, even though in this specific case, it
isn't since we're targeting an altivec store pattern...but they don't know
that.  So reload seems to say, if the address is ok, the src reg must need
reloading.  Did I get that right???

So how do we fix this?  It seems we need a way to disambiguate vector load/store
addresses that should allow reg+offset addresses (when they're targeting one of
the new power9 dform load/store patterns and when they should not allow 
reg+offset
addresses (when they're targeting older altivec and vsx load/stores that only
support reg+reg addressing).  Does anyone have an idea of how we can accomplish
that?

I'll note that I tried to modify rs6000_secondary_reload_memory to catch
the case where we are passing in an vector dform address with a regclass of
ALTIVEC_REGS, but that didn't seem to help.  That might be because
rs6000_legitimate_address_p() and quad_addredss_p() still say the reg+offset
address are ok.  Do we need to add an extra argument to those to tell them
to not allow reg+offset addressing when we know it's not allowed?

Peter

Re: [PATCH, rs6000] Fix PR target/71733, ICE with -mcpu=power9 -mno-vsx

Reply via email to