Hi Roger! On 2024-07-27T19:18:35+0100, "Roger Sayle" <ro...@nextmovesoftware.com> wrote: > Firstly, thanks to Haochen Gui for recently adding optab support for > isfinite and isnormal to the middle-end.
Do we, by the way, have documentation (I suppose that should be in "GNU Compiler Collection (GCC) Internals"?) about the rationale and subsequent optimization opportunities for having vs. not having representations of "codes" (like, 'isfinite') in the various GCC IRs etc., like builtins, internal functions, GIMPLE, optabs, RTL (..., and I've probably missed some more)? Of course, a lot of it can be inferred from the context or otherwise, like having builtins corresponding to C library functions and then be able to optimize according to their defined semantics, but others are not always clear to me: like, why do we have 'copysign' RTL but not 'ifnormal'? > This patch adds define_expand > for both these functions to the nvptx backend, which conveniently has > special instructions to simplify their implementation. ACK. > As this patch > adds UNSPEC_ISFINITE and UNSPEC_ISNORMAL, I've also taken the opportunity > to include/repost my tweak to clean-up/eliminate UNSPEC_COPYSIGN. I'd seen your 2023 "Add RTX codes for [...] COPYSIGN", but not yet seen a patch to use it for nvptx -- but indeed have stumbled over nvptx 'UNSPEC_COPYSIGN' a while ago; ACK. > Previously, for isfinite, GCC on nvptx-none with -O2 would generate: > > mov.f64 %r26, %ar0; > abs.f64 %r28, %r26; > setp.gtu.f64 %r31, %r28, 0d7fefffffffffffff; > selp.u32 %value, 0, 1, %r31; > > and with this patch, we now generate: > > mov.f64 %r23, %ar0; > testp.finite.f64 %r24, %r23; > selp.u32 %value, 1, 0, %r24; Nice! > Previously, for isnormal, GCC -O2 would generate: > > mov.f64 %r28, %ar0; > abs.f64 %r22, %r28; > setp.gtu.f64 %r32, %r22, 0d7fefffffffffffff; > setp.ltu.f64 %r35, %r22, 0d0010000000000000; > or.pred %r43, %r35, %r32; > selp.u32 %value, 0, 1, %r43; > > and with this patch becomes: > > mov.f64 %r23, %ar0; > setp.neu.f64 %r24, %r23, 0d0000000000000000; > testp.normal.f64 %r25, %r23; > and.pred %r26, %r24, %r25; > selp.u32 %value, 1, 0, %r26; > > Notice that although nvptx provides a testp.normal.f{32,64} instruction, > the semantics don't quite match those required of libm [+0.0 and -0.0 > are considered normal by this instruction, but need to return false > for __builtin_isnormal, hence the additional logic Ugh. ;-) > which is still > better than the original]. ACK. > This patch has been tested on nvptx-none hosted by x86_64-pc-linux-gnu > using make and make -k check, with only one new failure in the testsuite. > The test case g++.dg/opt/pr107569.C exposes a latent bug in the middle-end > (actually a missed optimization) as evrp fails to bound the results of > isfinite. This issue is independent of the back-end, as the tree-ssa > evrp pass is run long before __builtin_finite is expanded by the backend, > and the existence of an (any) isfinite optab is sufficient to expose it. > Fortunately, Haochem Gui has already posted/proposed a fix at > https://gcc.gnu.org/pipermail/gcc-patches/2024-July/657881.html > [which I'm sad to see is taking a while to review/get approved]. Well, now this nvptx one here took me even longer to look into, so the 'g++.dg/opt/pr107569.C' regression is resolved by now. ;-\ > Ok for mainline? Just minor items: generally, I do like seeing logically separate changes as separate commits (like, the 'copysign' cleanup is not conceptually related to the 'isfinite', 'isnormal' enhancements). However, that's my own ambition; I do acknowledge that others do things differently, like mixing in small cleanups with other changes. Also, I personally strive to go one step further with enhancing test suite coverage (for example, move towards using 'check-function-bodies' instead of 'scan-assembler', and first push the current/"bad" test case as its own commit, possibly partly XFAILed, and as part of the code-changes commit then "fix up" the test case, so that the latter changes are visible in the commit history). But again, that's my own ambition; I do acknowledge that others do things differently. All that said, the patch is OK as is, with just one small enhancement, see below. Thank you! > Thanks in advance (p.s. don't forget the nvptx_rtx_costs patch), Aye-aye! > --- a/gcc/config/nvptx/nvptx.md > +++ b/gcc/config/nvptx/nvptx.md > +(define_insn "setcc_isnormal<mode>" > + [(set (match_operand:BI 0 "nvptx_register_operand" "=R") > + (unspec:BI [(match_operand:SDFM 1 "nvptx_register_operand" "R")] > + UNSPEC_ISNORMAL))] > + "" > + "%.\\ttestp.normal%t1\\t%0, %1;") > + > +(define_expand "isnormal<mode>2" > + [(set (match_operand:SI 0 "nvptx_register_operand" "=R") > + (unspec:SI [(match_operand:SDFM 1 "nvptx_register_operand" "R")] > + UNSPEC_ISNORMAL))] > + "" > +{ > + rtx pred1 = gen_reg_rtx (BImode); > + rtx pred2 = gen_reg_rtx (BImode); > + rtx pred3 = gen_reg_rtx (BImode); > + rtx zero = CONST0_RTX (<MODE>mode); > + rtx cmp = gen_rtx_fmt_ee (NE, BImode, operands[1], zero); > + emit_insn (gen_cmp<mode> (pred1, cmp, operands[1], zero)); > + emit_insn (gen_setcc_isnormal<mode> (pred2, operands[1])); > + emit_insn (gen_andbi3 (pred3, pred1, pred2)); > + emit_insn (gen_setccsi_from_bi (operands[0], pred3)); > + DONE; > +}) Isn't it that this special "+/-0.0" handling conceptually belongs into 'setcc_isnormal<mode>' instead of 'isnormal<mode>2' -- or, simpler, 'setcc_isnormal<mode>' be renamed to 'setcc_nvptx_isnormal<mode>' (or similar), to make it clear that this is to fix up that for PTX 'testp', "As a special case, positive and negative zero are considered normal numbers"? Please add a comment to that effect to the code block in 'isnormal<mode>2', to help the next unaware reader (like, me, in a few months). Also, do you know whether we have execution test coverage for 'isnormal([+/-0.0])' (in the generic parts of the GCC test suite), or should we add something? (OK to do incrementally.) I suppose we've got something as part of 'gcc/testsuite/gcc.dg/c99-math.h', 'gcc/testsuite/gcc.dg/tg-tests.h', 'gcc/testsuite/gcc.dg/torture/floatn-tg.h' -- but these, curiously, all seem to skip the '-0.0' case? > --- /dev/null > +++ b/gcc/testsuite/gcc.target/nvptx/isnormal.c > @@ -0,0 +1,9 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > + > +int isnormal(double x) > +{ > + return __builtin_isnormal(x); > +} > + > +/* { dg-final { scan-assembler-times "testp.normal.f64" 1 } } */ For example, here, we could use 'check-function-bodies' to test the whole PTX instruction sequence instead of just 'testp.normal.f64'. (But again, not a requirement; more like something to keep in mind going forward.) Grüße Thomas