Hi Jeff,

> What I find rather surprising is the location of your changes -- they feel
> incomplete.  For example, you fix the callee side of returns in
> expand_value_return, but I don't analogous code for the caller side.
> 
> Similarly while you fix things for arguments in expand_expr_real_1, that's 
> again
> just the callee side.  Don't you need to do something on the caller side too?

I've taken the pragmatic approach for this fix to PR target/104489, that this
patch only needs to modify/fix the parts of the middle-end that are broken.
With this patch, gcc can compile the following with -O2 -misa=sm_80 -ffast-math

_Float16 p;
_Float16 q;
_Float16 r;

_Float16 foo(_Float16 x, _Float16 y)
{
  return x * y;
}

_Float16 mid(_Float16 x, _Float16 y)
{
  return foo(x,y) + foo(y,x);
}

void bar()
{
  p = mid(q,r);
}

which I assume covers all of the paths that I/we need to care about.
Technically, the blocker is that without this patch, GCC's build fails
in libgcc (compiling __mulhc3) when/if HFmode is enabled by default.
I'm hoping any remaining issues, not caught by the current testsuite,
can be handled as regular Bugzilla PRs to be fixed/added to the
testsuite.


Let me if there's anything I've missed or need to worry about.
I believe most PC laptops/desktops contain Nvidia graphics cards, so it's
relatively easy for GCC developers to try things out (on real hardware)
for themselves.
 
Cheers,
Roger
--

> -----Original Message-----
> From: Jeff Law <jeffreya...@gmail.com>
> Sent: 14 March 2022 15:30
> To: Roger Sayle <ro...@nextmovesoftware.com>; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] middle-end: Support ABIs that pass FP values as wider
> integers.
> 
> 
> 
> On 2/9/2022 1:12 PM, Roger Sayle wrote:
> > This patch adds middle-end support for target ABIs that pass/return
> > floating point values in integer registers with precision wider than
> > the original FP mode.  An example, is the nvptx backend where 16-bit
> > HFmode registers are passed/returned as (promoted to) SImode registers.
> > Unfortunately, this currently falls foul of the various (recent?)
> > sanity checks that (very sensibly) prevent creating paradoxical
> > SUBREGs of floating point registers.  The approach below is to
> > explicitly perform the conversion/promotion in two steps, via an
> > integer mode of same precision as the floating point value.  So on
> > nvptx, 16-bit HFmode is initially converted to 16-bit HImode (using
> > SUBREG), then zero-extended to SImode, and likewise when going the
> > other way, parameters truncated to HImode then converted to HFmode
> > (using SUBREG).  These changes are localized to expand_value_return
> > and expanding DECL_RTL to support strange ABIs, rather than inside
> > convert_modes or gen_lowpart, as mismatched precision integer/FP
> > conversions should be explicit in the RTL, and these semantics not generally
> visible/implicit in user code.
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check with no new failures, and on nvptx-none, where it is
> > the middle-end portion of a pair of patches to allow the default ISA
> > to be advanced.  Ok for mainline?
> >
> > 2022-02-09  Roger Sayle  <ro...@nextmovesoftware.com>
> >
> > gcc/ChangeLog
> >         * cfgexpand.cc (expand_value_return): Allow backends to promote
> >         a scalar floating point return value to a wider integer mode.
> >         * expr.cc (expand_expr_real_1) [expand_decl_rtl]: Likewise, allow
> >         backends to promote scalar FP PARM_DECLs to wider integer modes.
> 
> Buried somewhere in our calling conventions code is the ability to pass around
> BLKmode objects in registers along with the ability to tune left vs right 
> padding
> adjustments.   Much of this support grew out of the PA
> 32 bit SOM ABI.
> 
> While I think we could probably make those bits do what we want, I suspect the
> result will actually be uglier than what you've done here and I wouldn't be
> surprised if there was a performance hit as the code to handle those cases was
> pretty dumb in its implementation.
> 
> What I find rather surprising is the location of your changes -- they feel
> incomplete.  For example, you fix the callee side of returns in
> expand_value_return, but I don't analogous code for the caller side.
> 
> Similarly while you fix things for arguments in expand_expr_real_1, that's 
> again
> just the callee side.  Don't you need to so something on the caller side too?
> 
> Jeff
> 


Reply via email to