On Thu, Aug 27, 2020 at 9:17 AM Roger Sayle <ro...@nextmovesoftware.com> wrote: > > > >On 2020-08-26 5:23 p.m., Roger Sayle wrote: > >> These more accurate target rtx_costs are used by the > >> gimple-ssa-strength-reduction.c (via a call to mult_by_coeff_cost) to > >> decide whether applying strength reduction would be profitable. This test > >> case, slsr-13.c, assumes that two multiplications by four are > >> cheaper than two multiplications by five. (I believe) This is not the > >> case on hppa which > >> has a sh2add instruction, that performs a multiplication by five in > >> one cycle, or exactly the same cost as performing a left shift by two > >> (i.e. a multiplication by four). Oddly, I also believe this isn't the > >> case on x86_64, where the similar lea instruction is (sometimes) as > >> efficient as left shift by two bits. > >This looks like a regression. > > > >gcc-10 (prepatch): > > > > addl %r25,%r26,%r28 > > sh2addl %r25,%r28,%r25 > > sh2addl %r26,%r28,%r26 > > addl %r26,%r28,%r28 > > bv %r0(%r2) > > addl %r28,%r25,%r28 > > > > <bb 2> [local count: 1073741824]: > > x1_4 = c_2(D) + s_3(D); > > slsr_11 = s_3(D) * 4; > > x2_6 = x1_4 + slsr_11; > > slsr_12 = c_2(D) * 4; > > x3_8 = x1_4 + slsr_12; > > _1 = x1_4 + x2_6; > > x_9 = _1 + x3_8; > > return x_9; > > > >gcc-11 (with patch): > > > > addl %r25,%r26,%r19 > > sh2addl %r26,%r26,%r28 > > addl %r28,%r25,%r28 > > sh2addl %r25,%r25,%r25 > > addl %r28,%r19,%r28 > > addl %r25,%r26,%r26 > > bv %r0(%r2) > > addl %r28,%r26,%r28 > > > > <bb 2> [local count: 1073741824]: > > x1_4 = c_2(D) + s_3(D); > > a2_5 = s_3(D) * 5; > > x2_6 = c_2(D) + a2_5; > > a3_7 = c_2(D) * 5; > > x3_8 = s_3(D) + a3_7; > > _1 = x1_4 + x2_6; > > x_9 = _1 + x3_8; > > return x_9; > > > > Regards, > > Dave > > There are two interesting (tree-optimization) observations here. The first > is that at the tree-ssa > level both of these gimple sequences look to have exactly the same cost, > seven assignments on > a target where *4 is the same cost as *5. The gimple doesn't attempt to > model the sh?add/lea > instructions that combine may find, so at RTL expansion both sequences look > equivalent. One > fix may be to have gimple-ssa-strength-reduction.c just prefer > multiplications by 2, 4 and 8, > even on targets that have a single cycle "mul" instruction. > > The second observation is why isn't tree-ssa-reassoc.c doing something here. > The test case > is evaluating (s+c)+(s+5*c)+(5*s+c), and this strength reduction test is > expecting this to turn > into "tmp=s+c; return tmp+(tmp+4*c)+(4*s+tmp" which is clever and an > improvement, but > overlooks the obvious reassociation 7*(s+c). Indeed LLVM does this in three > instructions:
reassoc doesn't work on signed types > > tmp1 = s+c; > tmp2 = tmp1<<3; > return tmp2-tmp1; > > Although the PA backend is (mostly) innocent in this, the lowest impact > fix/work around is > to have multiplications by 2, 4 and 8 return COSTS_N_INSNS(1)-1, to indicate > a preference > when splitting ties. I'll prepare a patch. > > Roger > -- > >