On 11 December 2013 19:25, Vladimir Makarov <vmaka...@redhat.com> wrote:
> On 12/11/2013, 5:35 AM, Yvan Roux wrote:
>>
>> Hi Vladimir,
>>
>> I've some regressions on ARM after this SP elimination patch, and they
>> are execution failures.  Here is the list:
>>
>> g++.dg/cilk-plus/AN/array_test_ND_tplt.cc  -O3 -fcilkplus
>> gcc.c-torture/execute/va-arg-22.c  -O2
>> gcc.dg/atomic/c11-atomic-exec-5.c  -O0
>> gfortran.dg/direct_io_12.f90  -O[23]
>> gfortran.dg/elemental_dependency_1.f90  -O2
>> gfortran.dg/matmul_2.f90  -O2
>> gfortran.dg/matmul_6.f90  -O2
>> gfortran.dg/mvbits_7.f90  -O3
>> gfortran.dg/unlimited_polymorphic_1.f03  -O3
>>
>> I reduced and looked at var-arg-22.c and the issue is that in
>> lra_eliminate_regs_1  (called by get_equiv_with_elimination) we
>> transformed sfp + 0x4c in sp + 0xfc because of a bad sp offset.  What
>> we try to do here is to change the pseudo 195 of the insn 118 below :
>>
>> (insn 118 114 112 8 (set (reg:DI 195)
>>          (unspec:DI [
>>                  (mem:DI (plus:SI (reg/f:SI 215)
>>                          (const_int 8 [0x8])) [7 MEM[(struct A35 *)_12
>> + 64B]+8 S8 A8])
>>              ] UNSPEC_UNALIGNED_LOAD)) v2.c:49 146 {unaligned_loaddi}
>>       (expr_list:REG_EQUIV (mem/c:DI (plus:SI (reg/f:SI 192)
>>                  (const_int 8 [0x8])) [7 a35+8 S8 A32])
>>          (nil)))
>>
>> with its equivalent (x arg of lra_eliminate_regs_1):
>>
>> (mem/c:DI (plus:SI (reg/f:SI 102 sfp)
>>          (const_int 76 [0x4c])) [7 a35+8 S8 A32])
>>
>> lra_eliminate_regs_1 is called with full_p = true (it is not really
>> clear for what it means),
>
>
> It means we use full offset between the regs, otherwise we use change in the
> full offset from the previous iteration (it can be changed as we reserve
> stack memory for spilled pseudos and the reservation can be done several
> times).  As equiv value is stored as it was before any elimination, we need
> always to use full offset to make elimination.

Ok thanks it's clearer.

>  but in the PLUS switch case, we have offset
>>
>> = 0xb (given by ep->offset) and  as lra_get_insn_recog_data
>> (insn)->sp_offset value is 0, we will indeed add 0xb to the original
>> 0x4c offset.
>>
>
> 0 value is suspicious because it is default.  We might have not set up it
> from neighbor insns.
>
>
>
>> So, here I don't get if it is the sp_offset value of the
>> lra_insn_recog_data element which is not well updated or if lra_
>> eliminate_regs_1 has to be called with update_p and not full_p (which
>> fixed the value in that particular case).  Is it more obvious for you
>> ?
>>
>
> Yvan, could you send me the reduced preprocessed case and the options for
> cc1 to reproduce it.


Here is cc1 command line :

cc1 -quiet -march=armv7-a -mtune=cortex-a15 -mfloat-abi=hard
-mfpu=neon -mthumb  v2.c  -O2

I use a native build on a chromebook, but it's reproducible with a
cross compiler.

With the attached test case the issue is when processing insn 118.

Thanks,
Yvan
typedef __builtin_va_list __gnuc_va_list;
typedef __gnuc_va_list va_list;

extern void abort (void);
extern void exit (int);


void bar (int n, int c)
{
  static int lastn = -1, lastc = -1;

  if (lastn != n) {
    if (lastc != lastn)
      abort ();
    lastc = 0;
    lastn = n;
  }

  if (c != (char) (lastc ^ (n << 3)))
    abort ();
  lastc++;
}

 typedef struct { char x[31]; } A31;
 typedef struct { char x[32]; } A32;
 typedef struct { char x[35]; } A35;
 typedef struct { char x[72]; } A72;

void foo (int size, ...)
{
 A31 a31;
 A32 a32;
 A35 a35;
 A72 a72;


  va_list ap;

  int i;

  if (size != 21) abort ();

  __builtin_va_start(ap,size);

 a31 = __builtin_va_arg(ap,typeof (a31));
 for (i = 0; i < 31; i++) bar (31, a31.x[i]);
 a32 = __builtin_va_arg(ap,typeof (a32));
 for (i = 0; i < 32; i++) bar (32, a32.x[i]);
 a35 = __builtin_va_arg(ap,typeof (a35));
 for (i = 0; i < 35; i++) bar (35, a35.x[i]);
 a72 = __builtin_va_arg(ap,typeof (a72));
 for (i = 0; i < 72; i++) bar (72, a72.x[i]);

  __builtin_va_end(ap);

}

int main (void)
{
 A31 a31;
 A32 a32;
 A35 a35;
 A72 a72;
 int i;
 for (i = 0; i < 31; i++) a31.x[i] = i ^ (31 << 3);
 for (i = 0; i < 32; i++) a32.x[i] = i ^ (32 << 3);
 for (i = 0; i < 35; i++) a35.x[i] = i ^ (35 << 3);
 for (i = 0; i < 72; i++) a72.x[i] = i ^ (72 << 3);

  foo (21, a31, a32, a35, a72);
  exit (0);

}

Reply via email to