>On 2020-08-26 5:23 p.m., Roger Sayle wrote:
>> These more accurate target rtx_costs are used by the 
>> gimple-ssa-strength-reduction.c (via a call to mult_by_coeff_cost) to 
>> decide whether applying strength reduction would be profitable.  This test 
>> case, slsr-13.c, assumes that two multiplications by four are
>> cheaper than two multiplications by five.   (I believe) This is not the case 
>> on hppa which
>> has a sh2add instruction, that performs a multiplication by five in 
>> one cycle, or exactly the same cost as performing a left shift by two 
>> (i.e. a multiplication by four).  Oddly, I also believe this isn't the 
>> case on x86_64, where the similar lea instruction is (sometimes) as 
>> efficient as left shift by two bits.
>This looks like a regression.
>
>gcc-10 (prepatch):
>
>        addl %r25,%r26,%r28
>        sh2addl %r25,%r28,%r25
>        sh2addl %r26,%r28,%r26
>        addl %r26,%r28,%r28
>        bv %r0(%r2)
>        addl %r28,%r25,%r28
>
>  <bb 2> [local count: 1073741824]:
>  x1_4 = c_2(D) + s_3(D);
>  slsr_11 = s_3(D) * 4;
>  x2_6 = x1_4 + slsr_11;
>  slsr_12 = c_2(D) * 4;
>  x3_8 = x1_4 + slsr_12;
>  _1 = x1_4 + x2_6;
>  x_9 = _1 + x3_8;
>  return x_9;
>
>gcc-11 (with patch):
>
>        addl %r25,%r26,%r19
>        sh2addl %r26,%r26,%r28
>        addl %r28,%r25,%r28
>        sh2addl %r25,%r25,%r25
>        addl %r28,%r19,%r28
>        addl %r25,%r26,%r26
>        bv %r0(%r2)
>        addl %r28,%r26,%r28
>
>  <bb 2> [local count: 1073741824]:
>  x1_4 = c_2(D) + s_3(D);
>  a2_5 = s_3(D) * 5;
>  x2_6 = c_2(D) + a2_5;
>  a3_7 = c_2(D) * 5;
>  x3_8 = s_3(D) + a3_7;
>  _1 = x1_4 + x2_6;
>  x_9 = _1 + x3_8;
>  return x_9;
>
> Regards,
> Dave

There are two interesting (tree-optimization) observations here.  The first is 
that at the tree-ssa
level both of these gimple sequences look to have exactly the same cost, seven 
assignments on
a target where *4 is the same cost as *5.  The gimple doesn't attempt to model 
the sh?add/lea
instructions that combine may find, so at RTL expansion both sequences look 
equivalent.  One
fix may be to have gimple-ssa-strength-reduction.c just prefer multiplications 
by 2, 4 and 8,
even on targets that have a single cycle "mul" instruction.

The second observation is why isn't tree-ssa-reassoc.c doing something here.  
The test case
is evaluating (s+c)+(s+5*c)+(5*s+c), and this strength reduction test is 
expecting this to turn
into "tmp=s+c;  return tmp+(tmp+4*c)+(4*s+tmp" which is clever and an 
improvement, but
overlooks the obvious reassociation 7*(s+c).  Indeed LLVM does this in three 
instructions:

        tmp1 = s+c;
        tmp2 = tmp1<<3;
        return tmp2-tmp1;

Although the PA backend is (mostly) innocent in this, the lowest impact 
fix/work around is
to have multiplications by 2, 4 and 8 return COSTS_N_INSNS(1)-1, to indicate a 
preference
when splitting ties.  I'll prepare a patch.

Roger
--


Reply via email to