>On 2020-08-26 5:23 p.m., Roger Sayle wrote: >> These more accurate target rtx_costs are used by the >> gimple-ssa-strength-reduction.c (via a call to mult_by_coeff_cost) to >> decide whether applying strength reduction would be profitable. This test >> case, slsr-13.c, assumes that two multiplications by four are >> cheaper than two multiplications by five. (I believe) This is not the case >> on hppa which >> has a sh2add instruction, that performs a multiplication by five in >> one cycle, or exactly the same cost as performing a left shift by two >> (i.e. a multiplication by four). Oddly, I also believe this isn't the >> case on x86_64, where the similar lea instruction is (sometimes) as >> efficient as left shift by two bits. >This looks like a regression. > >gcc-10 (prepatch): > > addl %r25,%r26,%r28 > sh2addl %r25,%r28,%r25 > sh2addl %r26,%r28,%r26 > addl %r26,%r28,%r28 > bv %r0(%r2) > addl %r28,%r25,%r28 > > <bb 2> [local count: 1073741824]: > x1_4 = c_2(D) + s_3(D); > slsr_11 = s_3(D) * 4; > x2_6 = x1_4 + slsr_11; > slsr_12 = c_2(D) * 4; > x3_8 = x1_4 + slsr_12; > _1 = x1_4 + x2_6; > x_9 = _1 + x3_8; > return x_9; > >gcc-11 (with patch): > > addl %r25,%r26,%r19 > sh2addl %r26,%r26,%r28 > addl %r28,%r25,%r28 > sh2addl %r25,%r25,%r25 > addl %r28,%r19,%r28 > addl %r25,%r26,%r26 > bv %r0(%r2) > addl %r28,%r26,%r28 > > <bb 2> [local count: 1073741824]: > x1_4 = c_2(D) + s_3(D); > a2_5 = s_3(D) * 5; > x2_6 = c_2(D) + a2_5; > a3_7 = c_2(D) * 5; > x3_8 = s_3(D) + a3_7; > _1 = x1_4 + x2_6; > x_9 = _1 + x3_8; > return x_9; > > Regards, > Dave
There are two interesting (tree-optimization) observations here. The first is that at the tree-ssa level both of these gimple sequences look to have exactly the same cost, seven assignments on a target where *4 is the same cost as *5. The gimple doesn't attempt to model the sh?add/lea instructions that combine may find, so at RTL expansion both sequences look equivalent. One fix may be to have gimple-ssa-strength-reduction.c just prefer multiplications by 2, 4 and 8, even on targets that have a single cycle "mul" instruction. The second observation is why isn't tree-ssa-reassoc.c doing something here. The test case is evaluating (s+c)+(s+5*c)+(5*s+c), and this strength reduction test is expecting this to turn into "tmp=s+c; return tmp+(tmp+4*c)+(4*s+tmp" which is clever and an improvement, but overlooks the obvious reassociation 7*(s+c). Indeed LLVM does this in three instructions: tmp1 = s+c; tmp2 = tmp1<<3; return tmp2-tmp1; Although the PA backend is (mostly) innocent in this, the lowest impact fix/work around is to have multiplications by 2, 4 and 8 return COSTS_N_INSNS(1)-1, to indicate a preference when splitting ties. I'll prepare a patch. Roger --