On Wed, 2020-08-26 at 22:23 +0100, Roger Sayle wrote:
> The failure of slsr-13.c is not caused by the patchh3.txt, but the previous 
> patchh2.txt
> that's now on mainline and the gcc-10 branch.  That change provided more 
> accurate
> rtx_costs for hppa, and solved the performance problems with synth_mult.
> 
> These more accurate target rtx_costs are used by the 
> gimple-ssa-strength-reduction.c
> (via a call to mult_by_coeff_cost) to decide whether applying strength 
> reduction would
> be profitable.  This test case, slsr-13.c, assumes that two multiplications 
> by four are
> cheaper than two multiplications by five.   (I believe) This is not the case 
> on hppa which
> has a sh2add instruction, that performs a multiplication by five in one 
> cycle, or exactly
> the same cost as performing a left shift by two (i.e. a multiplication by 
> four).  Oddly, I
> also believe this isn't the case on x86_64, where the similar lea instruction 
> is (sometimes)
> as efficient as left shift by two bits.
Yea, you can do a multiplication by 5 cheap on the PA.  While the x86 can too, I
don't think it's as clear cut a win as the PA, so they may not cost the same as 
a
multiply by 4 or left shift by 2.


> 
> I suspect that slsr-13.c should be expected to fail on some platforms 
> depending upon 
> a targets instruction set/timings.
Sounds like you're right since it depends on mult_by_coeff_cost under the hood 
:(
 I presume you or John will xfail it for the PA.


> 
> Unfortunately, to complicate things in our case, it appears that after RTL 
> optimizations,
> performing this strength reduction actually does results in fewer 
> instructions on the PA,
> so it's the right thing to do.  I'll need to study the logic in 
> gimple-ssa-strength to see
> how mult_by_coeff cost is being used; cost(x*4) == cost(x*5), but cost(x*4+y) 
> < cost(x*5+y).
Yea, x*4+y is cheaper than x*5+y on the PA.  The first is a single sh2add, the
second requires an additional add instruction.

Jeff

Reply via email to