Jiong Wang writes: > Andrew Pinski writes: > >> On Mon, Jul 27, 2015 at 3:36 AM, James Greenhalgh >> <james.greenha...@arm.com> wrote: >>> On Mon, Jul 27, 2015 at 10:52:58AM +0100, pins...@gmail.com wrote: >>>> > On Jul 27, 2015, at 2:26 AM, Jiong Wang <jiong.w...@arm.com> wrote: >>>> > >>>> > Andrew Pinski writes: >>>> > >>>> >>> On Fri, Jul 24, 2015 at 2:07 AM, Jiong Wang <jiong.w...@arm.com> wrote: >>>> >>> >>>> >>> James Greenhalgh writes: >>>> >>> >>>> >>>>> On Wed, May 20, 2015 at 01:35:41PM +0100, Jiong Wang wrote: >>>> >>>>> Current IRA still use both target macros in a few places. >>>> >>>>> >>>> >>>>> Tell IRA to use the order we defined rather than with it's own cost >>>> >>>>> calculation. Allocate caller saved first, then callee saved. >>>> >>>>> >>>> >>>>> This is especially useful for LR/x30, as it's free to allocate and is >>>> >>>>> pure caller saved when used in leaf function. >>>> >>>>> >>>> >>>>> Haven't noticed significant impact on benchmarks, but by grepping >>>> >>>>> some >>>> >>>>> keywords like "Spilling", "Push.*spill" etc in ira rtl dump, the >>>> >>>>> number >>>> >>>>> is smaller. >>>> >>>>> >>>> >>>>> OK for trunk? >>>> >>>> >>>> >>>> OK, sorry for the delay. >>>> >>>> >>>> >>>> It might be mail client mangling, but please check that the trailing >>>> >>>> slashes >>>> >>>> line up in the version that gets committed. >>>> >>>> >>>> >>>> Thanks, >>>> >>>> James >>>> >>>> >>>> >>>>> 2015-05-19 Jiong. Wang <jiong.w...@arm.com> >>>> >>>>> >>>> >>>>> gcc/ >>>> >>>>> PR 63521 >>>> >>>>> * config/aarch64/aarch64.h (REG_ALLOC_ORDER): Define. >>>> >>>>> (HONOR_REG_ALLOC_ORDER): Define. >>>> >>> >>>> >>> Patch reverted. >>>> >> >>>> >> I did not see a reason why this patch was reverted. Maybe I am >>>> >> missing an email or something. >>>> > >>>> > There are several execution regressions under gcc testsuite, although as >>>> > far as I can see it's this patch exposed hidding bugs in those >>>> > testcases, but there might be one other issue, so to be conservative, I >>>> > temporarily reverted this patch. >>>> >>>> If you are talking about: >>>> gcc.target/aarch64/aapcs64/func-ret-2.c execution >>>> Etc. >>>> >>>> These test cases are too dependent on the original register allocation >>>> order >>>> and really can be safely ignored. Really these three tests should be moved >>>> or >>>> written in a more sane way. >>> >>> Yup, completely agreed - but the testcases do throw up something >>> interesting. If we are allocating registers to hold 128-bit values, and >>> we pick x7 as highest preference, we implicitly allocate x8 along with it. >>> I think we probably see the same thing if the first thing we do in a >>> function is a structure copy through a back-end expanded movmem, which >>> will likely begin with a 128-bit LDP using x7, x8. >>> >>> If the argument for this patch is that we prefer to allocate x7-x0 first, >>> followed by x8, then we've potentially made a sub-optimal decision, our >>> allocation order for 128-bit values is x7,x8,x5,x6 etc. >>> >>> My hunch is that we *might* get better code generation in this corner case >>> out of some permutation of the allocation order for argument >>> registers. I'm thinking something along the lines of >>> >>> {x6, x5, x4, x7, x3, x2, x1, x0, x8, ... } >>> >>> I asked Jiong to take a look at that, and I agree with his decision to >>> reduce the churn on trunk and just revert the patch until we've come to >>> a conclusion based on some evidence - rather than just my hunch! I agree >>> that it would be harmless on trunk from a testing point of view, but I >>> think Jiong is right to revert the patch until we better understand the >>> code-generation implications. >>> >>> Of course, it might be that I am completely wrong! If you've already taken >>> a look at using a register allocation order like the example I gave and >>> have something to share, I'd be happy to read your advice! >> >> Any news on this patch? It has been a year since it was reverted for >> a bad test that was failing. > > Hi Andrew, > > Yeah, those tests actually are expected to fail once register > allocation order changed, it's clearly documented in the comments: > > gcc.target/aarch64/aapcs64/abitest-2.h: > > /* ... > > Note that for value that is returned in the caller-allocated memory > block, we get the address from the saved x8 register. x8 is saved > just after the callee is returned; we assume that x8 has not been > clobbered at then, although there is no requirement for the callee > preserve the value stored in x8. Luckily, all test cases here are > simple enough that x8 doesn't normally get clobbered (although not > guaranteed). */ > > I had a local fix which use the redundant value returned to x0 to > repair the clobbered value in x8 as they will be identical for > structure type return, however that trick can't play anymore as we > recently defined TARGET_OMIT_STRUCT_RETURN_REG to true which will > remove that redundant x8 to x0 copy. > > Anyway, I will come back with some benchmark results of this patch on > top of latest trunk after the weekend run, and also answers to Jame's > concerns.
There is very tiny performance regression on SPEC2K6INT 464.h264ref. Checking the codegen, there are some bad instruction scheduling, it looks to me caused by REG_ALLOC_ORDER is not used consistently inside IRA that parts of the code are using new allocation order while some other parts are still using original order. I see in ira.c:setup_class_hard_regs we are checking whether REG_ALLOC_ORDER is defined: #ifdef REG_ALLOC_ORDER hard_regno = reg_alloc_order[i]; #else hard_regno = i; #endif but in ira-color.c:assign_hard_reg, there is no similar check: /* We don't care about giving callee saved registers to allocnos no living through calls because call clobbered registers are allocated first (it is usual practice to put them first in REG_ALLOC_ORDER). */ mode = ALLOCNO_MODE (a); for (i = 0; i < class_size; i++) { hard_regno = ira_class_hard_regs[aclass][i]; We might want to use reg_alloc_order to map above "i" if REG_ALLOC_ORDER is defined? Vlad, any comments? -- Regards, Jiong