Jiong Wang writes:

> Andrew Pinski writes:
>
>> On Mon, Jul 27, 2015 at 3:36 AM, James Greenhalgh
>> <james.greenha...@arm.com> wrote:
>>> On Mon, Jul 27, 2015 at 10:52:58AM +0100, pins...@gmail.com wrote:
>>>> > On Jul 27, 2015, at 2:26 AM, Jiong Wang <jiong.w...@arm.com> wrote:
>>>> >
>>>> > Andrew Pinski writes:
>>>> >
>>>> >>> On Fri, Jul 24, 2015 at 2:07 AM, Jiong Wang <jiong.w...@arm.com> wrote:
>>>> >>>
>>>> >>> James Greenhalgh writes:
>>>> >>>
>>>> >>>>> On Wed, May 20, 2015 at 01:35:41PM +0100, Jiong Wang wrote:
>>>> >>>>> Current IRA still use both target macros in a few places.
>>>> >>>>>
>>>> >>>>> Tell IRA to use the order we defined rather than with it's own cost
>>>> >>>>> calculation. Allocate caller saved first, then callee saved.
>>>> >>>>>
>>>> >>>>> This is especially useful for LR/x30, as it's free to allocate and is
>>>> >>>>> pure caller saved when used in leaf function.
>>>> >>>>>
>>>> >>>>> Haven't noticed significant impact on benchmarks, but by grepping 
>>>> >>>>> some
>>>> >>>>> keywords like "Spilling", "Push.*spill" etc in ira rtl dump, the 
>>>> >>>>> number
>>>> >>>>> is smaller.
>>>> >>>>>
>>>> >>>>> OK for trunk?
>>>> >>>>
>>>> >>>> OK, sorry for the delay.
>>>> >>>>
>>>> >>>> It might be mail client mangling, but please check that the trailing 
>>>> >>>> slashes
>>>> >>>> line up in the version that gets committed.
>>>> >>>>
>>>> >>>> Thanks,
>>>> >>>> James
>>>> >>>>
>>>> >>>>> 2015-05-19  Jiong. Wang  <jiong.w...@arm.com>
>>>> >>>>>
>>>> >>>>> gcc/
>>>> >>>>>  PR 63521
>>>> >>>>>  * config/aarch64/aarch64.h (REG_ALLOC_ORDER): Define.
>>>> >>>>>  (HONOR_REG_ALLOC_ORDER): Define.
>>>> >>>
>>>> >>> Patch reverted.
>>>> >>
>>>> >> I did not see a reason why this patch was reverted.  Maybe I am
>>>> >> missing an email or something.
>>>> >
>>>> > There are several execution regressions under gcc testsuite, although as
>>>> > far as I can see it's this patch exposed hidding bugs in those
>>>> > testcases, but there might be one other issue, so to be conservative, I
>>>> > temporarily reverted this patch.
>>>>
>>>> If you are talking about:
>>>> gcc.target/aarch64/aapcs64/func-ret-2.c execution
>>>> Etc.
>>>>
>>>> These test cases are too dependent on the original register allocation 
>>>> order
>>>> and really can be safely ignored. Really these three tests should be moved 
>>>> or
>>>> written in a more sane way.
>>>
>>> Yup, completely agreed - but the testcases do throw up something
>>> interesting. If we are allocating registers to hold 128-bit values, and
>>> we pick x7 as highest preference, we implicitly allocate x8 along with it.
>>> I think we probably see the same thing if the first thing we do in a
>>> function is a structure copy through a back-end expanded movmem, which
>>> will likely begin with a 128-bit LDP using x7, x8.
>>>
>>> If the argument for this patch is that we prefer to allocate x7-x0 first,
>>> followed by x8, then we've potentially made a sub-optimal decision, our
>>> allocation order for 128-bit values is x7,x8,x5,x6 etc.
>>>
>>> My hunch is that we *might* get better code generation in this corner case
>>> out of some permutation of the allocation order for argument
>>> registers. I'm thinking something along the lines of
>>>
>>>   {x6, x5, x4, x7, x3, x2, x1, x0, x8, ... }
>>>
>>> I asked Jiong to take a look at that, and I agree with his decision to
>>> reduce the churn on trunk and just revert the patch until we've come to
>>> a conclusion based on some evidence - rather than just my hunch! I agree
>>> that it would be harmless on trunk from a testing point of view, but I
>>> think Jiong is right to revert the patch until we better understand the
>>> code-generation implications.
>>>
>>> Of course, it might be that I am completely wrong! If you've already taken
>>> a look at using a register allocation order like the example I gave and
>>> have something to share, I'd be happy to read your advice!
>>
>> Any news on this patch?  It has been a year since it was reverted for
>> a bad test that was failing.
>
> Hi Andrew,
>
>   Yeah, those tests actually are expected to fail once register
>   allocation order changed, it's clearly documented in the comments:
>
>   gcc.target/aarch64/aapcs64/abitest-2.h:
>   
> /* ...
>
>    Note that for value that is returned in the caller-allocated memory
>    block, we get the address from the saved x8 register.  x8 is saved
>    just after the callee is returned; we assume that x8 has not been
>    clobbered at then, although there is no requirement for the callee
>    preserve the value stored in x8.  Luckily, all test cases here are
>    simple enough that x8 doesn't normally get clobbered (although not
>    guaranteed).  */
>
>   I had a local fix which use the redundant value returned to x0 to
>   repair the clobbered value in x8 as they will be identical for
>   structure type return, however that trick can't play anymore as we
>   recently defined TARGET_OMIT_STRUCT_RETURN_REG to true which will
>   remove that redundant x8 to x0 copy.
>   
>   Anyway, I will come back with some benchmark results of this patch on
>   top of latest trunk after the weekend run, and also answers to Jame's
>   concerns.

There is very tiny performance regression on SPEC2K6INT
464.h264ref. Checking the codegen, there are some bad instruction
scheduling, it looks to me caused by REG_ALLOC_ORDER is not used
consistently inside IRA that parts of the code are using new allocation
order while some other parts are still using original order.

I see in ira.c:setup_class_hard_regs we are checking whether
REG_ALLOC_ORDER is defined:

  #ifdef REG_ALLOC_ORDER
          hard_regno = reg_alloc_order[i];
  #else
          hard_regno = i;
  #endif

but in ira-color.c:assign_hard_reg, there is no similar check:

  /* We don't care about giving callee saved registers to allocnos no
     living through calls because call clobbered registers are
     allocated first (it is usual practice to put them first in
     REG_ALLOC_ORDER).  */
  mode = ALLOCNO_MODE (a);
  for (i = 0; i < class_size; i++)
    {
      hard_regno = ira_class_hard_regs[aclass][i];

We might want to use reg_alloc_order to map above "i" if REG_ALLOC_ORDER is
defined?

Vlad, any comments?

-- 
Regards,
Jiong

Reply via email to