Hi Jeff,
> What I find rather surprising is the location of your changes -- they feel > incomplete. For example, you fix the callee side of returns in > expand_value_return, but I don't analogous code for the caller side. > > Similarly while you fix things for arguments in expand_expr_real_1, that's > again > just the callee side. Don't you need to do something on the caller side too? I've taken the pragmatic approach for this fix to PR target/104489, that this patch only needs to modify/fix the parts of the middle-end that are broken. With this patch, gcc can compile the following with -O2 -misa=sm_80 -ffast-math _Float16 p; _Float16 q; _Float16 r; _Float16 foo(_Float16 x, _Float16 y) { return x * y; } _Float16 mid(_Float16 x, _Float16 y) { return foo(x,y) + foo(y,x); } void bar() { p = mid(q,r); } which I assume covers all of the paths that I/we need to care about. Technically, the blocker is that without this patch, GCC's build fails in libgcc (compiling __mulhc3) when/if HFmode is enabled by default. I'm hoping any remaining issues, not caught by the current testsuite, can be handled as regular Bugzilla PRs to be fixed/added to the testsuite. Let me if there's anything I've missed or need to worry about. I believe most PC laptops/desktops contain Nvidia graphics cards, so it's relatively easy for GCC developers to try things out (on real hardware) for themselves. Cheers, Roger -- > -----Original Message----- > From: Jeff Law <jeffreya...@gmail.com> > Sent: 14 March 2022 15:30 > To: Roger Sayle <ro...@nextmovesoftware.com>; gcc-patches@gcc.gnu.org > Subject: Re: [PATCH] middle-end: Support ABIs that pass FP values as wider > integers. > > > > On 2/9/2022 1:12 PM, Roger Sayle wrote: > > This patch adds middle-end support for target ABIs that pass/return > > floating point values in integer registers with precision wider than > > the original FP mode. An example, is the nvptx backend where 16-bit > > HFmode registers are passed/returned as (promoted to) SImode registers. > > Unfortunately, this currently falls foul of the various (recent?) > > sanity checks that (very sensibly) prevent creating paradoxical > > SUBREGs of floating point registers. The approach below is to > > explicitly perform the conversion/promotion in two steps, via an > > integer mode of same precision as the floating point value. So on > > nvptx, 16-bit HFmode is initially converted to 16-bit HImode (using > > SUBREG), then zero-extended to SImode, and likewise when going the > > other way, parameters truncated to HImode then converted to HFmode > > (using SUBREG). These changes are localized to expand_value_return > > and expanding DECL_RTL to support strange ABIs, rather than inside > > convert_modes or gen_lowpart, as mismatched precision integer/FP > > conversions should be explicit in the RTL, and these semantics not generally > visible/implicit in user code. > > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > > and make -k check with no new failures, and on nvptx-none, where it is > > the middle-end portion of a pair of patches to allow the default ISA > > to be advanced. Ok for mainline? > > > > 2022-02-09 Roger Sayle <ro...@nextmovesoftware.com> > > > > gcc/ChangeLog > > * cfgexpand.cc (expand_value_return): Allow backends to promote > > a scalar floating point return value to a wider integer mode. > > * expr.cc (expand_expr_real_1) [expand_decl_rtl]: Likewise, allow > > backends to promote scalar FP PARM_DECLs to wider integer modes. > > Buried somewhere in our calling conventions code is the ability to pass around > BLKmode objects in registers along with the ability to tune left vs right > padding > adjustments. Much of this support grew out of the PA > 32 bit SOM ABI. > > While I think we could probably make those bits do what we want, I suspect the > result will actually be uglier than what you've done here and I wouldn't be > surprised if there was a performance hit as the code to handle those cases was > pretty dumb in its implementation. > > What I find rather surprising is the location of your changes -- they feel > incomplete. For example, you fix the callee side of returns in > expand_value_return, but I don't analogous code for the caller side. > > Similarly while you fix things for arguments in expand_expr_real_1, that's > again > just the callee side. Don't you need to so something on the caller side too? > > Jeff >