Re: New regression on ARM Linux

Richard Earnshaw Tue, 31 Mar 2015 03:32:03 -0700

On 31/03/15 11:20, Richard Biener wrote:
> On Tue, 31 Mar 2015, Richard Biener wrote:
> 
>> On Tue, 31 Mar 2015, Richard Earnshaw wrote:
>>
>>> On 31/03/15 08:50, Richard Biener wrote:
>>>> On Mon, Mar 30, 2015 at 10:13 PM, Richard Biener <[email protected]> wrote:
>>>>> On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence 
>>>>> <[email protected]> wrote:
>>>>>> -O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes
>>>>>> it.
>>>>>>
>>>>>> The problem appears to be in laying out arguments, specifically
>>>>>> varargs. From
>>>>>> the "good" -fdump-rtl-expand:
>>>>>>
>>>>>> (insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0
>>>>>> S4 A32])
>>>>>>         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
>>>>>>      (nil))
>>>>>> (insn 19 18 20 2 (set (reg:DF 2 r2)
>>>>>>         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
>>>>>>      (nil))
>>>>>> (insn 20 19 21 2 (set (reg:SI 1 r1)
>>>>>>         (reg:SI 113 [ b1 ])) reduced.c:14 -1
>>>>>>      (nil))
>>>>>> (insn 21 20 22 2 (set (reg:SI 0 r0)
>>>>>>         (reg:SI 118)) reduced.c:14 -1
>>>>>>      (nil))
>>>>>> (call_insn 22 21 23 2 (parallel [
>>>>>>             (set (reg:SI 0 r0)
>>>>>>                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
>>>>>> <function_decl 0x2ab50e856d00 __builtin_printf>) [0 __builtin_printf S4
>>>>>> A32])
>>>>>>
>>>>>> The struct members are
>>>>>> reg:SI 113 => int a;
>>>>>> reg:DF 112 => double b;
>>>>>> reg:SI 111 => int c;
>>>>>>
>>>>>> r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is
>>>>>> pushed
>>>>>> into virtual-outgoing-args. In contrast, post-change to
>>>>>> build_ref_of_offset, we get:
>>>>>>
>>>>>> (insn 17 16 18 2 (set (reg:SI 118)
>>>>>>   (symbol_ref/v/f:SI ("*.LC1") [flags 0x82]  <var_decl 0x2ba57fa8d750
>>>>>> *.LC1>)) reduced.c:14 -1
>>>>>>      (nil))
>>>>>> (insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107
>>>>>> virtual-outgoing-args)
>>>>>>                 (const_int 8 [0x8])) [0  S4 A64])
>>>>>>         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
>>>>>>      (nil))
>>>>>> (insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0
>>>>>> S8 A64])
>>>>>>         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
>>>>>>      (nil))
>>>>>> (insn 20 19 21 2 (set (reg:SI 2 r2)
>>>>>>         (reg:SI 113 [ b1 ])) reduced.c:14 -1
>>>>>>      (nil))
>>>>>> (insn 21 20 22 2 (set (reg:SI 0 r0)
>>>>>>         (reg:SI 118)) reduced.c:14 -1
>>>>>>      (nil))
>>>>>> (call_insn 22 21 23 2 (parallel [
>>>>>>             (set (reg:SI 0 r0)
>>>>>>                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
>>>>>> <function_decl 0x2ba57f843d00 __builtin_printf>) [0 __builtin_printf S4
>>>>>> A32])
>>>>>>
>>>>>> r0 still gets the format string, but 'int b1.a' now goes in r2, and the
>>>>>>
>>>>>> double+following int are all pushed into virtual-outgoing-args. This is
>>>>>> because
>>>>>> arm_function_arg is fed a 64-bit-aligned int as type of the second
>>>>>> argument (the
>>>>>> type constructed by build_ref_for_offset); it then executes
>>>>>> (aapcs_layout_arg,
>>>>>> arm.c line ~~5914)
>>>>>>
>>>>>>   /* C3 - For double-word aligned arguments, round the NCRN up to the
>>>>>>      next even number.  */
>>>>>>   ncrn = pcum->aapcs_ncrn;
>>>>>>   if ((ncrn & 1) && arm_needs_doubleword_align (mode, type))
>>>>>>     ncrn++;
>>>>>>
>>>>>> Which changes r1 to r2. Passing -fno-tree-sra, or removing from the
>>>>>> testcase
>>>>>> "*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a
>>>>>> 32-bit-aligned int instead, which works as previously.
>>>>>>
>>>>>> Passing the same members of that struct in a non-vargs call, works ok -
>>>>>> I think
>>>>>> because these use the type of the declared parameters, rather than the
>>>>>> provided
>>>>>> arguments, and the former do not have the increased alignment from
>>>>>> build_ref_for_offset.
>>>>>
>>>>> It doesn't make sense to use the alignment of passed values.  That looks 
>>>>> like bs.
>>>>>
>>>>> This means that
>>>>>
>>>>> Int I __aligned__(8);
>>>>>
>>>>> Is passed differently than int.
>>>>>
>>>>> Arm_function_arg needs to be fixed.
>>>>
>>>> That is,
>>>>
>>>> typedef int myint __attribute__((aligned(8)));
>>>>
>>>> int main()
>>>> {
>>>>   myint i = 1;
>>>>   int j = 2;
>>>>   __builtin_printf("%d %d\n", i, j);
>>>> }
>>>>
>>>> or
>>>>
>>>> myint i;
>>>> int j;
>>>> myint *p = &i;
>>>> int *q = &j;
>>>>
>>>> int main()
>>>> {
>>>>   __builtin_printf("%d %d", *p, *q);
>>>> }
>>>>
>>>> should behave the same.  There isn't a printf modifier for an "aligned int"
>>>> because that sort of thing doesn't make sense.  Special-casing aligned vs.
>>>> non-aligned values only makes sense for things passed by value on the 
>>>> stack.
>>>> And then obviously only dependent on the functuion type signature, not
>>>> on the type of the passed value.
>>>>
>>>
>>> I think the testcase is ill-formed.  Just because printf doesn't have
>>> such a modifier, doesn't mean that another variadic function might not
>>> have a means to detect when an object in the variadic list needs to be
>>> over-aligned.  As such, the test should really be written as:
>>
>> A value doesn't have "alignment".  A function may have alignment
>> requirements on its arguments, clearly printf doesn't.
> 
> Btw, it looks like function_arg is supposed to pass whether the argument
> is to the variadic part of the function or not, but it gets passed
> false as in
> 
>       args[i].reg = targetm.calls.function_arg (args_so_far, mode, type,
>                                                 argpos < n_named_args);
> 
> n_named_args is 4.  That is because ARM asks to do this via
> 
>   if (type_arg_types != 0
>       && targetm.calls.strict_argument_naming (args_so_far))
>     ;
>   else if (type_arg_types != 0
>            && ! targetm.calls.pretend_outgoing_varargs_named 
> (args_so_far))
>     /* Don't include the last named arg.  */
>     --n_named_args;
>   else
>     /* Treat all args as named.  */
>     n_named_args = num_actuals;
> 
> thus you lose that info (you could have that readily available).
>


>From the calling side of a function it shouldn't matter.  They only
thing the caller has to know is that the function is variadic (and
therefore that the base-standard rules from the AAPCS apply -- no use of
FP registers for parameters).  The behaviour after that is *as if* all
the arguments were pushed onto the stack in the correct order and
finally the lowest 4 words are popped off the stack again into r0-r3.
Hence any alignment that would apply to arguments on the stack has to be
reflected in their allocation into r0-r3 (since the push/pop of those
registers never happens).

R.

>>> typedef int myint __attribute__((aligned(8)));
>>>
>>> int main()
>>> {
>>>   myint i = 1;
>>>   int j = 2;
>>>   __builtin_printf("%d %d\n", (int) i, j);
>>> }
>>>
>>> Variadic functions take the types of their arguments from the types of
>>> the actuals passed.  The compiler should either be applying appropriate
>>> promotion rules to make the types conformant by the language
>>> specification or respecting the types exactly.  However, that should be
>>> done in the mid-end not the back-end.  If incorrect alignment
>>> information is passed to the back-end it can't help but make the wrong
>>> choice.  Examining the mode here wouldn't help either.  Consider:
>>>
>>> int foo (int a, int b, int c, int d, int e, int f
>>> __attribute__((aligned(8))));
>>>
>>> On ARM that should pass f in a 64-bit aligned location (since the
>>> parameter will be on the stack).
>>>
>>> R.
>>>
>>>
>>>> Richard.
>>>>
>>>>> Richard.
>>>>>
>>>>>>
>>>>>> FWIW, I also tried:
>>>>>>
>>>>>> __attribute__((__aligned__((16)))) int x;
>>>>>> int main (void)
>>>>>> {
>>>>>>   __builtin_printf("%d\n", x);
>>>>>> }
>>>>>>
>>>>>> but in that case, the arm_function_arg is still fed a type with
>>>>>> alignment 32
>>>>>> (bits), i.e. distinct from the type of the field 'x' in memory, which
>>>>>> has
>>>>>> alignment 128.
>>>>>>
>>>>>> --Alan
>>>>>>
>>>>>> Richard Biener wrote:
>>>>>>> On Mon, 30 Mar 2015, Richard Biener wrote:
>>>>>>>
>>>>>>>> On Mon, 30 Mar 2015, Alan Lawrence wrote:
>>>>>>>>
>>>>>>>>> ...actually attach the testcase...
>>>>>>>> What compile options?
>>>>>>>
>>>>>>> Just tried -O2.  The GIMPLE IL assumes 64bit alignment of .LC0 but
>>>>>>> I can't see anything not guaranteeing that:
>>>>>>>
>>>>>>>         .section        .rodata
>>>>>>>         .align  3
>>>>>>> .LANCHOR0 = . + 0
>>>>>>> .LC1:
>>>>>>>         .ascii  "%d %g %d\012\000"
>>>>>>>         .space  6
>>>>>>> .LC0:
>>>>>>>         .word   7
>>>>>>>         .space  4
>>>>>>>         .word   0
>>>>>>>         .word   1075838976
>>>>>>>         .word   9
>>>>>>>         .space  4
>>>>>>>
>>>>>>> maybe there is some more generic code-gen bug for aligned aggregate
>>>>>>> copy?  That is, the patch tells the backend that the loads and
>>>>>>> stores to the 'int' vars (which have padding followed) is aligned
>>>>>>> to 8 bytes.
>>>>>>>
>>>>>>> I don't see what is wrong in the final assembler, but maybe
>>>>>>> some endian issue exists?  The code looks quite ugly though ;)
>>>>>>>
>>>>>>> Richard.
>>>>>>>
>>>>>>>>> Alan Lawrence wrote:
>>>>>>>>>> We've been seeing a bunch of new failures in the *libffi*
>>>>>> testsuite on ARM
>>>>>>>>>> Linux (arm-none-linux-gnueabi, arm-none-linux-gnueabihf),
>>>>>> following this
>>>>>>>>>> one-liner fix. I've reduced the testcase down to the attached
>>>>>> (including
>>>>>>>>>> removing any dependency on libffi); with gcc r221347, this prints
>>>>>> the
>>>>>>>>>> expected
>>>>>>>>>> 7 8 9
>>>>>>>>>> whereas with gcc r221348, instead it prints
>>>>>>>>>> 0 8 0
>>>>>>>>>>
>>>>>>>>>> The action of r221348 is to change the alignment of a mem_ref, and
>>>>>> a
>>>>>>>>>> var_decl of b1, from 32 to 64; both have type
>>>>>>>>>>   type <record_type 0x2b9b8d428d20 cls_struct_16byte
>>>>>> sizes-gimplified type_0
>>>>>>>>>> BLK
>>>>>>>>>>          size <integer_cst 0x2b9b8d3720a8 constant 192>
>>>>>>>>>>          unit size <integer_cst 0x2b9b8d372078 constant 24>
>>>>>>>>>>          align 64 symtab 0 alias set 1 canonical type
>>>>>> 0x2b9b8d428d20
>>>>>>>>>>          fields <field_decl 0x2b9b8d42b098 a type <integer_type
>>>>>>>>>> 0x2b9b8d092690 int>
>>>>>>>>>>              SI file reduced.c line 12 col 7
>>>>>>>>>>              size <integer_cst 0x2b9b8d08eeb8 constant 32>
>>>>>>>>>>              unit size <integer_cst 0x2b9b8d08eed0 constant 4>
>>>>>>>>>>              align 32 offset_align 64
>>>>>>>>>>              offset <integer_cst 0x2b9b8d08eee8 constant 0>
>>>>>>>>>>              bit offset <integer_cst 0x2b9b8d08ef48 constant 0>
>>>>>> context
>>>>>>>>>> <record_type 0x2b9b8d428d20 cls_struct_16byte> chain <field_decl
>>>>>>>>>> 0x2b9b8d42b130 b>> context <translation_unit_decl 0x2b9b8d4232d0
>>>>>> D.6070>
>>>>>>>>>>          pointer_to_this <pointer_type 0x2b9b8d42d0a8> chain
>>>>>> <type_decl
>>>>>>>>>> 0x2b9b8d42b000 D.6044>>
>>>>>>>>>>
>>>>>>>>>> The tree-optimized output is the same with both compilers (as this
>>>>>> does not
>>>>>>>>>> mention alignment); the expand output differs.
>>>>>>>>>>
>>>>>>>>>> Still investigating...
>>>>>>>>>>
>>>>>>>>>> --Alan
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Richard Biener wrote:
>>>>>>>>>>> This fixes a vectorizer testcase regression on powerpc where SRA
>>>>>>>>>>> drops alignment info unnecessarily.
>>>>>>>>>>>
>>>>>>>>>>> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
>>>>>>>>>>>
>>>>>>>>>>> Richard.
>>>>>>>>>>>
>>>>>>>>>>> 2015-03-11  Richard Biener  <[email protected]>
>>>>>>>>>>>
>>>>>>>>>>>  PR tree-optimization/65310
>>>>>>>>>>>  * tree-sra.c (build_ref_for_offset): Also preserve larger
>>>>>>>>>>>  alignment.
>>>>>>>>>>>
>>>>>>>>>>> Index: gcc/tree-sra.c
>>>>>>>>>>>
>>>>>> ===================================================================
>>>>>>>>>>> --- gcc/tree-sra.c       (revision 221324)
>>>>>>>>>>> +++ gcc/tree-sra.c       (working copy)
>>>>>>>>>>> @@ -1597,7 +1597,7 @@ build_ref_for_offset (location_t loc, tr
>>>>>>>>>>>    misalign = (misalign + offset) & (align - 1);
>>>>>>>>>>>    if (misalign != 0)
>>>>>>>>>>>      align = (misalign & -misalign);
>>>>>>>>>>> -  if (align < TYPE_ALIGN (exp_type))
>>>>>>>>>>> +  if (align != TYPE_ALIGN (exp_type))
>>>>>>>>>>>      exp_type = build_aligned_type (exp_type, align);
>>>>>>>>>>>     mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base,
>>>>>> off);
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>>
>

Re: New regression on ARM Linux

Reply via email to