On Thu, Sep 17, 2020 at 12:51:25PM -0500, Segher Boessenkool wrote: > Hi! > > On Tue, Sep 15, 2020 at 10:49:45AM +0930, Alan Modra wrote: > > Also use rs6000_cost only for speed. > > More directly: use something completely different for !speed, namely, > code size.
Yes, that might be better. > > - if (CONST_INT_P (XEXP (x, 1)) > > - && satisfies_constraint_I (XEXP (x, 1))) > > + if (!speed) > > + /* A little more than one insn so that nothing is tempted to > > + turn a shift left into a multiply. */ > > + *total = COSTS_N_INSNS (1) + 1; > > Please don't. We have a lot of code elsewhere to handle this directly, > already. Also, this is just wrong for size. Five shifts is *not* > better than four muls. If that is the only way to get good results, > than unfortunately we probably have to; but do not do this without any > proof. Huh. If a cost of 5 is "just wrong for size" then you prefer a cost of 12 for example (power9 mulsi or muldi rs6000_cost)? Noticing that result for !speed rs6000_rtx_costs is the entire basis for the !speed changes. I don't have any proof that this is correct. > > case FMA: > > - if (mode == SFmode) > > + if (!speed) > > + *total = COSTS_N_INSNS (1) + 1; > > Not here, either. > > > case DIV: > > case MOD: > > if (FLOAT_MODE_P (mode)) > > { > > - *total = mode == DFmode ? rs6000_cost->ddiv > > - : rs6000_cost->sdiv; > > + if (!speed) > > + *total = COSTS_N_INSNS (1) + 2; > > And why + 2 even? > > > - if (GET_MODE (XEXP (x, 1)) == DImode) > > + if (!speed) > > + *total = COSTS_N_INSNS (1) + 2; > > + else if (GET_MODE (XEXP (x, 1)) == DImode) > > *total = rs6000_cost->divdi; > > else > > *total = rs6000_cost->divsi; > > (more) OK, I can remove all the !speed changes. To be honest, I didn't look anywhere near as much at code size changes as I worried about performance. And about not regressing any fiddly testcase we have. > > @@ -21368,6 +21378,7 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int > > outer_code, > > return false; > > > > case AND: > > + *total = COSTS_N_INSNS (1); > > right = XEXP (x, 1); > > if (CONST_INT_P (right)) > > { > > @@ -21380,15 +21391,15 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int > > outer_code, > > || left_code == LSHIFTRT) > > && rs6000_is_valid_shift_mask (right, left, mode)) > > { > > - *total = rtx_cost (XEXP (left, 0), mode, left_code, 0, speed); > > - if (!CONST_INT_P (XEXP (left, 1))) > > - *total += rtx_cost (XEXP (left, 1), SImode, left_code, 1, > > speed); > > - *total += COSTS_N_INSNS (1); > > + rtx reg_op = XEXP (left, 0); > > + if (!REG_P (reg_op)) > > + *total += rtx_cost (reg_op, mode, left_code, 0, speed); > > + reg_op = XEXP (left, 1); > > + if (!REG_P (reg_op) && !CONST_INT_P (reg_op)) > > + *total += rtx_cost (reg_op, mode, left_code, 1, speed); > > return true; > > } > > } > > - > > - *total = COSTS_N_INSNS (1); > > return false; > > This doesn't improve anything? It just makes it different from all > surrounding code? So it moves the common COSTS_N_INSNS (1) count and doesn't recurse for regs, like it doesn't for const_int. > > @@ -21519,7 +21530,9 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int > > outer_code, > > if (outer_code == TRUNCATE > > && GET_CODE (XEXP (x, 0)) == MULT) > > { > > - if (mode == DImode) > > + if (!speed) > > + *total = COSTS_N_INSNS (1) + 1; > > (more) > > > + case SET: > > + /* The default cost of a SET is the number of general purpose > > + regs being set multiplied by COSTS_N_INSNS (1). That only > > + works where the incremental cost of the operation and > > + operands is zero, when the operation performed can be done in > > + one instruction. For other cases where we add COSTS_N_INSNS > > + for some operation (see point 5 above), COSTS_N_INSNS (1) > > + should be subtracted from the total cost. */ > > What does "incremental cost" mean there? If what increases? > > > + { > > + rtx_code src_code = GET_CODE (SET_SRC (x)); > > + if (src_code == CONST_INT > > + || src_code == CONST_DOUBLE > > + || src_code == CONST_WIDE_INT) > > + return false; > > + int set_cost = (rtx_cost (SET_SRC (x), mode, SET, 1, speed) > > + + rtx_cost (SET_DEST (x), mode, SET, 0, speed)); > > This should use set_src_cost, if anything. But that will recurse then, > currently? Ugh. > > Using rtx_cost for SET_DEST is problematic, too. > > What (if anything!) calls this for a SET? Oh, set_rtx_cost still does > that, hrm. > > rtx_cost costs RTL *expressions*. Not instructions. Except where it is > (ab)used for that, sigh. > > Many targets have something for it already, but all quite different from > this. Right, you are starting to understand just how difficult it is to do anything at all to rs6000_rtx_costs. > > + if (set_cost >= COSTS_N_INSNS (1)) > > + *total += set_cost - COSTS_N_INSNS (1); > > I don't understand this part at all, for example. Why not just > *total += set_cost - COSTS_N_INSNS (1); > ? If set_cost is lower than one insn's cost, don't we have a problem > already? The set_cost I calculate here from src and dest can easily be zero. (set (reg) (reg)) and (set (reg) (const_int 0)) for example have a dest cost of zero and a src cost of zero. That can't change without breaking places where rtx_costs is called to compare pieces of RTL. Here though we happen to be looking at a SET, so have an entire instruction. The value returned should be comparable to our instruction costs. That's tricky to do, and this change is just a hack. Without the hack I saw some testcases regress. I don't like this hack any more than you do reviewing it! > > Generic things. Please split this patch up when sending it again, it > does too many different things, and many of those are not obvious. > > All such changes that aren't completely obvious (like the previous ones > were) should have some measurement. We are in stage1, and we will > notice (non-trivial) degradations, but if we can expect degradations > (like for this patch), it needs benchmarking. Pat did benchmark these changes.. I was somewhat surprised to see a small improvement in spec results. > Since you add !speed all over the place, maybe we should just have a > separate function that does !speed? It looks like quite a few things > will simplify. Revised patch as follows. * config/rs6000/rs6000.c (rs6000_rtx_costs): Reduce cost for SETs when insn operation cost handled on recursive call. Tidy break/return. Tidy AND costing. diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 6af8a9a31cb..26c2f443502 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -21397,7 +21397,7 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code, *total = rs6000_cost->fp; else *total = rs6000_cost->dmul; - break; + return false; case DIV: case MOD: @@ -21457,6 +21457,7 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code, return false; case AND: + *total = COSTS_N_INSNS (1); right = XEXP (x, 1); if (CONST_INT_P (right)) { @@ -21469,15 +21470,15 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code, || left_code == LSHIFTRT) && rs6000_is_valid_shift_mask (right, left, mode)) { - *total = rtx_cost (XEXP (left, 0), mode, left_code, 0, speed); - if (!CONST_INT_P (XEXP (left, 1))) - *total += rtx_cost (XEXP (left, 1), SImode, left_code, 1, speed); - *total += COSTS_N_INSNS (1); + rtx reg_op = XEXP (left, 0); + if (!REG_P (reg_op)) + *total += rtx_cost (reg_op, mode, left_code, 0, speed); + reg_op = XEXP (left, 1); + if (!REG_P (reg_op) && !CONST_INT_P (reg_op)) + *total += rtx_cost (reg_op, mode, left_code, 1, speed); return true; } } - - *total = COSTS_N_INSNS (1); return false; case IOR: @@ -21575,7 +21576,7 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code, *total = rs6000_cost->fp; return false; } - break; + return false; case NE: case EQ: @@ -21613,13 +21614,40 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code, *total = 0; return true; } - break; + return false; + + case SET: + /* The default cost of a SET is the number of general purpose + regs being set multiplied by COSTS_N_INSNS (1). Here we + happen to be looking at a SET, so have an instruction rather + than just a piece of RTL and want to return a cost comparable + to rs6000 instruction costing. That's a little complicated + because in some cases the cost of SET operands is non-zero, + see point 5 above and cost of PLUS for example, and in + others it is zero, for example for (set (reg) (reg)). + But (set (reg) (reg)) actually costs the same as + (set (reg) (plus (reg) (reg))). Hack around this by + subtracting COSTS_N_INSNS (1) from the operand cost in cases + were we add COSTS_N_INSNS (1) for some operation. Don't do + so for constants that might cost more than zero because they + don't fit in one instruction. FIXME: rtx_costs should not be + looking at entire instructions. */ + { + rtx_code src_code = GET_CODE (SET_SRC (x)); + if (src_code == CONST_INT + || src_code == CONST_DOUBLE + || src_code == CONST_WIDE_INT) + return false; + int set_cost = (rtx_cost (SET_SRC (x), mode, SET, 1, speed) + + rtx_cost (SET_DEST (x), mode, SET, 0, speed)); + if (set_cost >= COSTS_N_INSNS (1)) + *total += set_cost - COSTS_N_INSNS (1); + return true; + } default: - break; + return false; } - - return false; } /* Debug form of r6000_rtx_costs that is selected if -mdebug=cost. */ -- Alan Modra Australia Development Lab, IBM