Re: [PATCH] middle-end/117801 - failed register coalescing due to GIMPLE schedule

Richard Sandiford Thu, 05 Dec 2024 05:24:38 -0800

Richard Biener <rguent...@suse.de> writes:
> On Fri, 29 Nov 2024, Jakub Jelinek wrote:
>
>> On Fri, Nov 29, 2024 at 09:19:55AM +0100, Richard Biener wrote:
>> > For a TSVC testcase we see failed register coalescing due to a
>> > different schedule of GIMPLE .FMA and stores fed by it.  This
>> > can be mitigated by making direct internal functions participate
>> > in TER - given we're using more and more of such functions to
>> > expose target capabilities it seems to be a natural thing to not
>> > exempt those.
>> > 
>> > Unfortunately the internal function expanding API doesn't match
>> > what we usually have - passing in a target and returning an RTX
>> > but instead the LHS of the call is expanded and written to.  This
>> > makes the TER expansion of a call SSA def a bit unwieldly.
>> 
>> Can't we change that?
>> Especially if it is only for the easiest subset of internal fns
>> (I see you limit it only to direct_internal_fn_p), if it has just
>> one or a couple of easy implementations, those could be split into
>> one which handles the whole thing by just expanding lhs and calling
>> another function with the rtx target argument into which to store
>> stuff (or const0_rtx for ignored result?) and handle the actual expansion,
>> and then have an exported function from internal-fn.cc which expr.cc
>> could call for the TERed internal-fn case.
>> That function could assert it is only direct_internal_fn_p or some
>> other subset which it would handle.
>
> The expander goes through macro-generated expand_FOO (see top of
> internal-fn.cc), and in the end dispatches to expand_*_optab_fn
> of which there is a generic one for UNARY, BINARY and TERNARY
> but very many OPTAB_NAME variants, like expand_fold_len_extract_optab_fn
> dispatching to expand_direct_optab_fn or complex ones like
> expand_gather_load_optab_fn.  There's unfortunately no good way
> to factor out a different API there, at least not easily.


Converting everything to a new, single API seems feasible as
far as the refactoring goes.  But then I wonder what the API
should be.

One option would be:

  void expand_FOO (internal_fn fn, gcall *call, rtx lhs_rtx)

where the rtx is provided by the caller and is nonnull whenever
the lhs is needed.  (This is different from the usual "target"
argument.)  For this option, the dispatcher would use a single:

  lhs_rtx = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);

when no lhs is provided by the caller.

But this option assumes that the store to the lhs can always be done
on rtl, with (for example) none of the special casing in expand_assignment.
That's probably true now, but it would be a difficult assumption to
reverse later.

Another option would be the usual:

  rtx expand_FOO (internal_fn fn, gcall *call, rtx target)

which returns the result as an rtx, with TARGET just being
a suggestion.  But as discussed, this doesn't really fit the
semantics of things like LOAD_LANES and STORE_LANES very well,
where the lhs is an aggregate.

More generally, I suppose the question with this option is: when
can the expander ignore the call's lhs?  Always, with the caller
handling the fallout?  When target is nonnull?  When the lhs is
an SSA_NAME?  Or something else?  The first option complicates
callers, and doesn't fit LOAD/STORE_LANES well, whereas the others
effectively create two cases for each function.

IMO, the current interface is the natural one for its current callers
(both in cfgexpand).  And if TER is something that we want to remove,
maybe it's not worth changing everything over to a different interface
just for that.

So having played around with a few possibilities locally, I've kind-of
grown to like the patch :)

Thanks,
Richard

Re: [PATCH] middle-end/117801 - failed register coalescing due to GIMPLE schedule

Reply via email to