On 11 December 2013 19:25, Vladimir Makarov <vmaka...@redhat.com> wrote: > On 12/11/2013, 5:35 AM, Yvan Roux wrote: >> >> Hi Vladimir, >> >> I've some regressions on ARM after this SP elimination patch, and they >> are execution failures. Here is the list: >> >> g++.dg/cilk-plus/AN/array_test_ND_tplt.cc -O3 -fcilkplus >> gcc.c-torture/execute/va-arg-22.c -O2 >> gcc.dg/atomic/c11-atomic-exec-5.c -O0 >> gfortran.dg/direct_io_12.f90 -O[23] >> gfortran.dg/elemental_dependency_1.f90 -O2 >> gfortran.dg/matmul_2.f90 -O2 >> gfortran.dg/matmul_6.f90 -O2 >> gfortran.dg/mvbits_7.f90 -O3 >> gfortran.dg/unlimited_polymorphic_1.f03 -O3 >> >> I reduced and looked at var-arg-22.c and the issue is that in >> lra_eliminate_regs_1 (called by get_equiv_with_elimination) we >> transformed sfp + 0x4c in sp + 0xfc because of a bad sp offset. What >> we try to do here is to change the pseudo 195 of the insn 118 below : >> >> (insn 118 114 112 8 (set (reg:DI 195) >> (unspec:DI [ >> (mem:DI (plus:SI (reg/f:SI 215) >> (const_int 8 [0x8])) [7 MEM[(struct A35 *)_12 >> + 64B]+8 S8 A8]) >> ] UNSPEC_UNALIGNED_LOAD)) v2.c:49 146 {unaligned_loaddi} >> (expr_list:REG_EQUIV (mem/c:DI (plus:SI (reg/f:SI 192) >> (const_int 8 [0x8])) [7 a35+8 S8 A32]) >> (nil))) >> >> with its equivalent (x arg of lra_eliminate_regs_1): >> >> (mem/c:DI (plus:SI (reg/f:SI 102 sfp) >> (const_int 76 [0x4c])) [7 a35+8 S8 A32]) >> >> lra_eliminate_regs_1 is called with full_p = true (it is not really >> clear for what it means), > > > It means we use full offset between the regs, otherwise we use change in the > full offset from the previous iteration (it can be changed as we reserve > stack memory for spilled pseudos and the reservation can be done several > times). As equiv value is stored as it was before any elimination, we need > always to use full offset to make elimination.
Ok thanks it's clearer. > but in the PLUS switch case, we have offset >> >> = 0xb (given by ep->offset) and as lra_get_insn_recog_data >> (insn)->sp_offset value is 0, we will indeed add 0xb to the original >> 0x4c offset. >> > > 0 value is suspicious because it is default. We might have not set up it > from neighbor insns. > > > >> So, here I don't get if it is the sp_offset value of the >> lra_insn_recog_data element which is not well updated or if lra_ >> eliminate_regs_1 has to be called with update_p and not full_p (which >> fixed the value in that particular case). Is it more obvious for you >> ? >> > > Yvan, could you send me the reduced preprocessed case and the options for > cc1 to reproduce it. Here is cc1 command line : cc1 -quiet -march=armv7-a -mtune=cortex-a15 -mfloat-abi=hard -mfpu=neon -mthumb v2.c -O2 I use a native build on a chromebook, but it's reproducible with a cross compiler. With the attached test case the issue is when processing insn 118. Thanks, Yvan
typedef __builtin_va_list __gnuc_va_list; typedef __gnuc_va_list va_list; extern void abort (void); extern void exit (int); void bar (int n, int c) { static int lastn = -1, lastc = -1; if (lastn != n) { if (lastc != lastn) abort (); lastc = 0; lastn = n; } if (c != (char) (lastc ^ (n << 3))) abort (); lastc++; } typedef struct { char x[31]; } A31; typedef struct { char x[32]; } A32; typedef struct { char x[35]; } A35; typedef struct { char x[72]; } A72; void foo (int size, ...) { A31 a31; A32 a32; A35 a35; A72 a72; va_list ap; int i; if (size != 21) abort (); __builtin_va_start(ap,size); a31 = __builtin_va_arg(ap,typeof (a31)); for (i = 0; i < 31; i++) bar (31, a31.x[i]); a32 = __builtin_va_arg(ap,typeof (a32)); for (i = 0; i < 32; i++) bar (32, a32.x[i]); a35 = __builtin_va_arg(ap,typeof (a35)); for (i = 0; i < 35; i++) bar (35, a35.x[i]); a72 = __builtin_va_arg(ap,typeof (a72)); for (i = 0; i < 72; i++) bar (72, a72.x[i]); __builtin_va_end(ap); } int main (void) { A31 a31; A32 a32; A35 a35; A72 a72; int i; for (i = 0; i < 31; i++) a31.x[i] = i ^ (31 << 3); for (i = 0; i < 32; i++) a32.x[i] = i ^ (32 << 3); for (i = 0; i < 35; i++) a35.x[i] = i ^ (35 << 3); for (i = 0; i < 72; i++) a72.x[i] = i ^ (72 << 3); foo (21, a31, a32, a35, a72); exit (0); }