On Wed, 2020-08-26 at 22:23 +0100, Roger Sayle wrote: > The failure of slsr-13.c is not caused by the patchh3.txt, but the previous > patchh2.txt > that's now on mainline and the gcc-10 branch. That change provided more > accurate > rtx_costs for hppa, and solved the performance problems with synth_mult. > > These more accurate target rtx_costs are used by the > gimple-ssa-strength-reduction.c > (via a call to mult_by_coeff_cost) to decide whether applying strength > reduction would > be profitable. This test case, slsr-13.c, assumes that two multiplications > by four are > cheaper than two multiplications by five. (I believe) This is not the case > on hppa which > has a sh2add instruction, that performs a multiplication by five in one > cycle, or exactly > the same cost as performing a left shift by two (i.e. a multiplication by > four). Oddly, I > also believe this isn't the case on x86_64, where the similar lea instruction > is (sometimes) > as efficient as left shift by two bits. Yea, you can do a multiplication by 5 cheap on the PA. While the x86 can too, I don't think it's as clear cut a win as the PA, so they may not cost the same as a multiply by 4 or left shift by 2.
> > I suspect that slsr-13.c should be expected to fail on some platforms > depending upon > a targets instruction set/timings. Sounds like you're right since it depends on mult_by_coeff_cost under the hood :( I presume you or John will xfail it for the PA. > > Unfortunately, to complicate things in our case, it appears that after RTL > optimizations, > performing this strength reduction actually does results in fewer > instructions on the PA, > so it's the right thing to do. I'll need to study the logic in > gimple-ssa-strength to see > how mult_by_coeff cost is being used; cost(x*4) == cost(x*5), but cost(x*4+y) > < cost(x*5+y). Yea, x*4+y is cheaper than x*5+y on the PA. The first is a single sh2add, the second requires an additional add instruction. Jeff