Re: [PATCH] middle-end: Improve RTL expansion in expand_mul_overflow,

Richard Sandiford Thu, 09 Jul 2020 01:47:19 -0700

Jakub Jelinek <ja...@redhat.com> writes:
> On Thu, Jul 09, 2020 at 09:17:46AM +0100, Richard Sandiford wrote:
>> > --- a/gcc/internal-fn.c
>> > +++ b/gcc/internal-fn.c
>> > @@ -1627,6 +1627,9 @@ expand_mul_overflow (location_t loc, tree lhs, tree 
>> > arg0, tree arg1,
>> >                                 profile_probability::very_likely ());
>> >      else
>> >        {
>> > +        /* RES is used more than once, place it in a pseudo.  */
>> > +        res = force_reg (mode, res);
>> > +
>> >          rtx signbit = expand_shift (RSHIFT_EXPR, mode, res, prec - 1,
>> >                                      NULL_RTX, 0);
>> >          /* RES is low half of the double width result, HIPART
>> 
>> In general, this can be dangerous performance-wise on targets where
>> subregs are free.  If the move survives to the register allocators,
>> it increases the risk that the move will become a machine insn.
>> (The RA will prefer to tie the registers, but that isn't guaranteed.)
>> 
>> But more fundamentally, this can hurt if the SUBREG_REG is live at
>> the same time as the new pseudo, since the RA then has to treat them
>> as separate quantities.  From your results, that obviously doesn't
>> occur in the test case, but I'm not 100% confident that it won't
>> occur elsewhere.
>> 
>> If target-independent code is going to optimise for “no subreg operand”
>> targets like nvptx, I think it needs to know that the target wants that.
>
> Isn't that though what the expander is doing in lots of places?
> Force operands into pseudos especially if optimize to hope for better CSE
> etc., and hope combine does its job to undo it when it is better to be
> propagated?
> It is true that if res is used several times, then combine will not
> propagate it due to multiple uses, so the question I guess is why as Roger
> says we get the same code before/after (which pass undoes that; RA?).


I'd imagine fwprop.  That's just a guess though.

But I don't see what this force_reg achieves on “free subreg” targets,
even for CSE.  The SUBREG_REG is still set as a full REG value and
can be CSEd in the normal way.  And because the subreg itself is free,
trying to CSE the subregs is likely to be actively harmful for the
reason above: we can then have the SUBREG_REG and the CSEd subreg
live at the same time.

I.e. it's not usually worth extending the lifetime of a previous
subreg result when a new subreg on the same value would have no cost.

Maybe in the old days it made more sense, because we relied on RTL
optimisers to do constant propagation and folding, and so a subreg
move might later become a constant move, with the constant then being
CSEable.  Is that what you mean?  But that should be much less necessary
now.  There shouldn't be many cases in which only the RTL optimisers can
prove that an internal function argument is constant.

Thanks,
Richard

Re: [PATCH] middle-end: Improve RTL expansion in expand_mul_overflow,

Reply via email to