Hi,

Attached is a fix for what seems to be a reload bug while handling subreg(mem...). I ran into this problem while implementing support for struct load/store in AArch64 using the standard patterns vec_<loadstore>_lanes<large_int_mode><vec_mode> on the same lines of the ARM backend. The test case that caused the issue was:

void SexiALI_Convert(void *vdest, void *vsrc, unsigned int frames, int n)
{
 unsigned int x;
 short *src = vsrc;
 unsigned char *dest = vdest;
 for(x=0;x<256;x++)
 {
  int tmp;
  tmp = *src;
  src++;
  tmp += *src;
  src++;
  *dest++ = tmp;
 }
}

Before reload, this is the RTL dump I see:

.....
(insn 110 114 111 4 (set (reg:V8HI 158 [ vect_var_.21 ])
(subreg:V8HI (reg:OI 530 [ vect_array.20 ]) 0)) ice.i:9 512 {*aarch64_simd_movv8hi}
     (nil))

(insn 111 110 115 4 (set (reg:V8HI 159 [ vect_var_.22 ])
(subreg:V8HI (reg:OI 530 [ vect_array.20 ]) 16)) ice.i:9 512 {*aarch64_simd_movv8hi}
     (expr_list:REG_DEAD (reg:OI 530 [ vect_array.20 ])
        (nil)))

(insn 115 111 116 4 (set (reg:V8HI 161 [ vect_var_.24 ])
(subreg:V8HI (reg:OI 529 [ vect_array.23 ]) 0)) ice.i:9 512 {*aarch64_simd_movv8hi}
     (nil))

(insn 116 115 117 4 (set (reg:V8HI 162 [ vect_var_.25 ])
(subreg:V8HI (reg:OI 529 [ vect_array.23 ]) 16)) ice.i:9 512 {*aarch64_simd_movv8hi}
     (expr_list:REG_DEAD (reg:OI 529 [ vect_array.23 ])
        (nil)))

(insn 117 116 118 4 (set (reg:V4SI 544 [ vect_var_.27 ])
        (sign_extend:V4SI (vec_select:V4HI (reg:V8HI 159 [ vect_var_.22 ])
                (parallel:V8HI [
                        (const_int 0 [0])
                        (const_int 1 [0x1])
                        (const_int 2 [0x2])
                        (const_int 3 [0x3])
                    ])))) ice.i:11 700 {aarch64_simd_vec_unpacks_lo_v8hi}
     (nil))

(insn 118 117 124 4 (set (reg:V4SI 545 [ vect_var_.26 ])
        (sign_extend:V4SI (vec_select:V4HI (reg:V8HI 158 [ vect_var_.21 ])
                (parallel:V8HI [
                        (const_int 0 [0])
                        (const_int 1 [0x1])
                        (const_int 2 [0x2])
                        (const_int 3 [0x3])
                    ])))) ice.i:9 700 {aarch64_simd_vec_unpacks_lo_v8hi}
     (nil))

.....

In insn 116, reg_equiv_mem () of the psuedoreg 529 is (mem:OI (reg sp)), and the subreg is equivalent to:
        subreg:V8HI (mem:OI (reg sp) 16)
which does not get folded into
        mem:V8HI (plus:DI (reg sp) (const_int 16))
because, in reload.c:find_reloads_toplev () where such subregs are narrowed into
narower memrefs, the memref supplied to strict_memory_address_addr_space_P () is
just (mem:OI (reg sp)) and the SUBREG_BYTE is forgotten. Therefore
strict_memory_address_addr_space_P () thinks that (mem:OI (reg sp)) is a
valid target address and lets it pass as a subreg and does not narrow the subreg
into a narrower memref. find_reloads_toplev () should have infact given
strict_memory_address_addr_space_P () (mem:OI (plus:DI (reg sp) (const_int 16)) ) which will be returned as false as base+offset is invalid for NEON addressing
modes and this will be reloaded into a narrower memref.

Also, I tried writing a secondary reload for this, but at no time is the RTL
        (subreg:V8HI (mem:OI (reg sp)) 16)
available to the target secondary reload for it to fix it up.

Therefore, I've fixed find_reloads_toplev () to pass the full address to strict_memory_address_addr_space_P () in the case of subregs.

Does this look like a sane fix?

I've tested this patch on arm-none-eabi and bootstrapped on x86_64-pc-linux and
all is well.

Thanks,
Tejas Belagod.
ARM.

Changelog:

2012-06-28  Tejas Belagod  <tejas.bela...@arm.com>

gcc/
        * reload.c (find_reloads_toplev): Include the subreg byte in the address
        of memrefs when converting subregs of mems into narrower memrefs.
diff --git a/gcc/reload.c b/gcc/reload.c
index e42cc5c..b6d4ce9 100644
--- a/gcc/reload.c
+++ b/gcc/reload.c
@@ -4771,15 +4771,27 @@ find_reloads_toplev (rtx x, int opnum, enum reload_type 
type,
 #ifdef LOAD_EXTEND_OP
          && !paradoxical_subreg_p (x)
 #endif
-         && (reg_equiv_address (regno) != 0
-             || (reg_equiv_mem (regno) != 0
-                 && (! strict_memory_address_addr_space_p
-                     (GET_MODE (x), XEXP (reg_equiv_mem (regno), 0),
-                      MEM_ADDR_SPACE (reg_equiv_mem (regno)))
-                     || ! offsettable_memref_p (reg_equiv_mem (regno))
-                     || num_not_at_initial_offset))))
-       x = find_reloads_subreg_address (x, 1, opnum, type, ind_levels,
-                                          insn, address_reloaded);
+        )
+       {
+         if (reg_equiv_address (regno) != 0)
+           x = find_reloads_subreg_address (x, 1, opnum, type, ind_levels,
+                                            insn, address_reloaded);
+         else if (reg_equiv_mem (regno) != 0)
+           {
+             tem =
+               simplify_gen_subreg (GET_MODE (x), reg_equiv_mem (regno),
+                                    GET_MODE (SUBREG_REG (x)),
+                                    SUBREG_BYTE (x));
+             gcc_assert (tem);
+             if (MEM_P (tem)
+                 && (!strict_memory_address_addr_space_p
+                       (GET_MODE (x), tem, MEM_ADDR_SPACE (tem))
+                     || !offsettable_memref_p (tem)
+                     || num_not_at_initial_offset))
+               x = find_reloads_subreg_address (x, 1, opnum, type, ind_levels,
+                                                insn, address_reloaded);
+           }
+       }
     }
 
   for (copied = 0, i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)

Reply via email to