[Bug rtl-optimization/113533] [14 Regression] Code generation regression after change for pr111267

olegendo at gcc dot gnu.org via Gcc-bugs Mon, 22 Jan 2024 19:27:24 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113533


--- Comment #11 from Oleg Endo <olegendo at gcc dot gnu.org> ---
(In reply to Roger Sayle from comment #10)

> I've found an interesting table of SH cycle counts (for different CPUs) at
> http://www.shared-ptr.com/sh_insns.html

Yeah, I know.  I did that ;)

> In my proposed patch, the address cost (1) when optimizing for size attempts
> to return the additional size of an instruction based on the addressing
> mode.  For register, and reg+reg addressing modes there is no size increase
> (overhead), and for adressing modes with displacements, and displacements to
> address pointers, there is a cost.

AFAIR, I've added the 'sh_address_cost' function.  The intention was/is to
encourage/discourage usage of certain address modes based on the side effects
and impact on the surrounding code.  All insns/addr modes have the same length
and basically same execution time.  However, e.g. @(reg+reg) has a constraint
on 'r0' usage, so I weighted that heavier.  If there's anything that could use
@(reg+disp) as an alternative, that'd be better in some cases. (not sure if
such optimizations actually are done...)

> (2) when optimizing for speed, address
> cost remains between 0 and 3, and is used to prioritize between (equivalent
> numbers of) instructions.  Normally, rtx_costs are defined in terms of
> COST_N_INSNS, which multiplies by 4.  Hence on many platforms a single
> instruction that references memory may be encoded as COSTS_N_INSNS(1)+1 (or
> a more complex addressing mode as COSTS_N_INSNS(1)+2) to show that this is
> disfavored to a single instruction that doesn't reference memory,
> COSTS_N_INSNS(1)+0.

That's actually what sh_rtx_costs was supposed to do as well.  I think in usual
cases it does that, only that apparently I've screwed up the {SIGN|ZERO}_EXTEND
for the case of the mem load and it shows up only now, many years later.

It's still not entirely clear to me why we would want to squash the costs of
addresses to 0 when optimizing for size?  What does effect does it have on the
generated code?  I can't imagine how it would be possibly making any smaller
code?

With your patch, in case of the SIGN_EXTEND with mem operand, it would make the
address cost 0 with -Os, which would return COSTS_N_INSNS(1) for reg operand as
well as mem operand.  So both insns are equally weighted and could be
considered interchangeable.  And we might bump into this type of regression
again, if some (future) optimization decides that it can interchange/substitute
insns of the same cost... 


> For example, SH currently reports multiplications as a single cycle operation,

That doesn't seem to be the case.  It's supposed to be using the function
'multcosts' in sh.cc, which returns at least a cost of '2'.  Note that on SH1
and SH2 there is no dynamic (barrel) shift.  So actually some multiplications
could be faster than stitched shifts.


> sh_rtx_costs doesn't distinguish the machine mode, so the costs of SImode 
> multiplications are the same as DImode multiplications.

I guess this is because SH doesn't have real DImode multiplication (64 x 64 ->
64/128 bit).  It can only do 32 x 32 -> 64 bit widening multiplication.  Any
real DImode multiplication will result in either expanded sequence to calculate
sum of particial products or a libcall, AFAIR

[Bug rtl-optimization/113533] [14 Regression] Code generation regression after change for pr111267

Reply via email to