On Thu, Feb 11, 2016 at 7:14 AM, Jeff Law <l...@redhat.com> wrote: > On 02/09/2016 04:08 AM, Bin Cheng wrote: >> >> Hi, >> When counting cost for loop inv, GCC checks if a loop inv can be >> propagated into its use site (a memory reference). If it cannot be >> propagated, we increase its cost so that it's expensive enough to be hoisted >> out of loop. Currently we simply replace loop inv register in the use site >> with its definition expression, then call validate_changes to check if the >> result insn is valid. This is weak because validate_changes doesn't take >> canonicalization into consideration. Given below example: >> >> Loop inv def: >> 69: r149:SI=r87:SI+const(unspec[`xxxx'] 1) >> REG_DEAD r87:SI >> Loop inv use: >> 70: r150:SI=[r90:SI*0x4+r149:SI] >> REG_DEAD r149:SI >> >> The address expression after propagation is "r90 * 0x4 + (r87 + >> const(unspec[`xxxx']))". Function validate_changes simply returns false to >> it. As a matter of fact, the propagation is feasible if we canonicalize >> address expression into the form like "(r90 * 0x4 + r87) + >> const(unspec[`xxxx'])". >> >> This patch fixes the problem by canonicalizing address expression and >> verifying if the new addr is valid. The canonicalization follows GCC insn >> canonicalization rules. The test case from bugzilla PR is also included. >> As for the canonicalize_address interface, there is another >> canonicalize_address in fwprop.c which only changes shift into mult. I >> think it would be good to factor out a common RTL interface in GCC, but >> that's stage1 work. > > > Also note there's bits in combine that will canonicalize appropriate shifts > into mults. Clearly there's a need for some generalized routines to take a > fairly generic address and perform canonicalizations and simplifications on > it. > >> Bootstrap and test on x86_64 and AArch64. Is it OK? >> >> Thanks, >> bin >> >> 2016-02-09 Bin Cheng<bin.ch...@arm.com> >> >> PR tree-optimization/69052 >> * loop-invariant.c (canonicalize_address): New function. >> (inv_can_prop_to_addr_use): Check validity of address expression >> which is canonicalized by above function. >> >> gcc/testsuite/ChangeLog >> 2016-02-09 Bin Cheng<bin.ch...@arm.com> >> >> PR tree-optimization/69052 >> * gcc.target/i386/pr69052.c: New test. >> >> >> pr69052-20160204.txt >> >> >> diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c >> index 707f044..157e273 100644 >> --- a/gcc/loop-invariant.c >> +++ b/gcc/loop-invariant.c >> @@ -754,6 +754,74 @@ create_new_invariant (struct def *def, rtx_insn >> *insn, bitmap depends_on, >> return inv; >> } >> >> +/* Returns a canonical version address for X. It identifies >> + addr expr in the form of A + B + C. Following instruction >> + canonicalization rules, MULT operand is moved to the front, >> + CONST operand is moved to the end; also PLUS operators are >> + chained to the left. */ >> + >> +static rtx >> +canonicalize_address (rtx x) >> +{ >> + rtx op0, op1, op2; >> + machine_mode mode = GET_MODE (x); >> + enum rtx_code code = GET_CODE (x); >> + >> + if (code != PLUS) >> + return x; >> + >> + /* Extract operands from A + B (+ C). */ >> + if (GET_CODE (XEXP (x, 0)) == PLUS) >> + { >> + op0 = XEXP (XEXP (x, 0), 0); >> + op1 = XEXP (XEXP (x, 0), 1); >> + op2 = XEXP (x, 1); >> + } >> + else if (GET_CODE (XEXP (x, 1)) == PLUS) >> + { >> + op0 = XEXP (x, 0); >> + op1 = XEXP (XEXP (x, 1), 0); >> + op2 = XEXP (XEXP (x, 1), 1); >> + } >> + else >> + { >> + op0 = XEXP (x, 0); >> + op1 = XEXP (x, 1); >> + op2 = NULL_RTX; >> + } >> + >> + /* Move MULT operand to the front. */ >> + if (!REG_P (op1) && !CONST_INT_P (op1)) >> + std::swap (op0, op1); > > This feels a bit hack-ish in the sense that you already know the form of the > RTL you're expecting and just assume that you'll be given something of that > form, but no more complex. > > ISTM you're better off walking the whole rtx, recording the tidbits as you > go into a vec. If you see something unexpected during that walk, you punt > canonicalization of the whole expression. > > You then sort the vec. You want to move things like MULT to the start and > all the constants to the end I think. > > You then do simplifications, particularly on the constants, but there may be > something useful to do with MULT terms as well. You could also arrange to > rewrite ASHIFTs into MULTs at this stage. > > Then you generate a new equivalent expression from the simplified operands > in the vec. > > You might look at tree-ssa-reassoc for ideas on implementation details. > > Initially just use it in the LICM code, but I think given that kind of > structure it'd be generally useful to replace bits of combine and fwprop > > If your contention is that only a few forms really matter, then I'd like to > see those forms spelled out better in the comment and some kind of checking > that we have reasonable incoming RTL. > > >> + >> + /* Move CONST operand to the end. */ >> + if (CONST_INT_P (op0)) >> + std::swap (op0, op1); > > You might want to check CONSTANT_P here. Maybe it doesn't matter in > practice, but things like (plus (plus (symbol-ref) (const_int) const_int)) > > That also gives you a fighting chance at extending this to handle > HIGH/LO_SUM which are going to appear on the RISCy targets. > > So I think the concept of making sure we're passing canonical RTL to the > verification step is good, I'm a bit concerned about the implementation of > the canonicalization step. Hi Jeff, Thanks for detailed review. I also think a generic canonical interface for RTL is much better. I will give it a try. But with high chance it's a next stage1 stuff.
Thanks, bin > > Jeff