Hi Roger!

On 2024-07-27T19:18:35+0100, "Roger Sayle" <ro...@nextmovesoftware.com> wrote:
> Firstly, thanks to Haochen Gui for recently adding optab support for
> isfinite and isnormal to the middle-end.

Do we, by the way, have documentation (I suppose that should be in
"GNU Compiler Collection (GCC) Internals"?) about the rationale and
subsequent optimization opportunities for having vs. not having
representations of "codes" (like, 'isfinite') in the various GCC IRs
etc., like builtins, internal functions, GIMPLE, optabs, RTL (..., and
I've probably missed some more)?

Of course, a lot of it can be inferred from the context or otherwise,
like having builtins corresponding to C library functions and then be
able to optimize according to their defined semantics, but others are not
always clear to me: like, why do we have 'copysign' RTL but not
'ifnormal'?

> This patch adds define_expand
> for both these functions to the nvptx backend, which conveniently has
> special instructions to simplify their implementation.

ACK.

> As this patch
> adds UNSPEC_ISFINITE and UNSPEC_ISNORMAL, I've also taken the opportunity
> to include/repost my tweak to clean-up/eliminate UNSPEC_COPYSIGN.

I'd seen your 2023 "Add RTX codes for [...] COPYSIGN", but not yet seen a
patch to use it for nvptx -- but indeed have stumbled over nvptx
'UNSPEC_COPYSIGN' a while ago; ACK.

> Previously, for isfinite, GCC on nvptx-none with -O2 would generate:
>
>                 mov.f64 %r26, %ar0;
>                 abs.f64 %r28, %r26;
>                 setp.gtu.f64    %r31, %r28, 0d7fefffffffffffff;
>                 selp.u32        %value, 0, 1, %r31;
>
> and with this patch, we now generate:
>
>                 mov.f64 %r23, %ar0;
>                 testp.finite.f64        %r24, %r23;
>                 selp.u32        %value, 1, 0, %r24;

Nice!

> Previously, for isnormal, GCC -O2 would generate:
>
>                 mov.f64 %r28, %ar0;
>                 abs.f64 %r22, %r28;
>                 setp.gtu.f64    %r32, %r22, 0d7fefffffffffffff;
>                 setp.ltu.f64    %r35, %r22, 0d0010000000000000;
>                 or.pred %r43, %r35, %r32;
>                 selp.u32        %value, 0, 1, %r43;
>
> and with this patch becomes:
>
>                 mov.f64 %r23, %ar0;
>                 setp.neu.f64    %r24, %r23, 0d0000000000000000;
>                 testp.normal.f64        %r25, %r23;
>                 and.pred        %r26, %r24, %r25;
>                 selp.u32        %value, 1, 0, %r26;
>
> Notice that although nvptx provides a testp.normal.f{32,64} instruction,
> the semantics don't quite match those required of libm [+0.0 and -0.0
> are considered normal by this instruction, but need to return false
> for __builtin_isnormal, hence the additional logic

Ugh.  ;-)

> which is still
> better than the original].

ACK.

> This patch has been tested on nvptx-none hosted by x86_64-pc-linux-gnu
> using make and make -k check, with only one new failure in the testsuite.
> The test case g++.dg/opt/pr107569.C exposes a latent bug in the middle-end
> (actually a missed optimization) as evrp fails to bound the results of
> isfinite.  This issue is independent of the back-end, as the tree-ssa
> evrp pass is run long before __builtin_finite is expanded by the backend,
> and the existence of an (any) isfinite optab is sufficient to expose it.
> Fortunately, Haochem Gui has already posted/proposed a fix at
> https://gcc.gnu.org/pipermail/gcc-patches/2024-July/657881.html
> [which I'm sad to see is taking a while to review/get approved].

Well, now this nvptx one here took me even longer to look into, so the
'g++.dg/opt/pr107569.C' regression is resolved by now.  ;-\

> Ok for mainline?

Just minor items: generally, I do like seeing logically separate changes
as separate commits (like, the 'copysign' cleanup is not conceptually
related to the 'isfinite', 'isnormal' enhancements).  However, that's my
own ambition; I do acknowledge that others do things differently, like
mixing in small cleanups with other changes.  Also, I personally strive
to go one step further with enhancing test suite coverage (for example,
move towards using 'check-function-bodies' instead of 'scan-assembler',
and first push the current/"bad" test case as its own commit, possibly
partly XFAILed, and as part of the code-changes commit then "fix up" the
test case, so that the latter changes are visible in the commit history).
But again, that's my own ambition; I do acknowledge that others do things
differently.

All that said, the patch is OK as is, with just one small enhancement,
see below.  Thank you!

> Thanks in advance (p.s. don't forget the nvptx_rtx_costs patch),

Aye-aye!

> --- a/gcc/config/nvptx/nvptx.md
> +++ b/gcc/config/nvptx/nvptx.md

> +(define_insn "setcc_isnormal<mode>"
> +  [(set (match_operand:BI 0 "nvptx_register_operand" "=R")
> +     (unspec:BI [(match_operand:SDFM 1 "nvptx_register_operand" "R")]
> +                UNSPEC_ISNORMAL))]
> +  ""
> +  "%.\\ttestp.normal%t1\\t%0, %1;")
> +
> +(define_expand "isnormal<mode>2"
> +  [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
> +     (unspec:SI [(match_operand:SDFM 1 "nvptx_register_operand" "R")]
> +                UNSPEC_ISNORMAL))]
> +  ""
> +{
> +  rtx pred1 = gen_reg_rtx (BImode);
> +  rtx pred2 = gen_reg_rtx (BImode);
> +  rtx pred3 = gen_reg_rtx (BImode);
> +  rtx zero = CONST0_RTX (<MODE>mode);
> +  rtx cmp = gen_rtx_fmt_ee (NE, BImode, operands[1], zero);
> +  emit_insn (gen_cmp<mode> (pred1, cmp, operands[1], zero));
> +  emit_insn (gen_setcc_isnormal<mode> (pred2, operands[1]));
> +  emit_insn (gen_andbi3 (pred3, pred1, pred2));
> +  emit_insn (gen_setccsi_from_bi (operands[0], pred3));
> +  DONE;
> +})

Isn't it that this special "+/-0.0" handling conceptually belongs into
'setcc_isnormal<mode>' instead of 'isnormal<mode>2' -- or, simpler,
'setcc_isnormal<mode>' be renamed to 'setcc_nvptx_isnormal<mode>' (or
similar), to make it clear that this is to fix up that for PTX 'testp',
"As a special case, positive and negative zero are considered normal
numbers"?  Please add a comment to that effect to the code block in
'isnormal<mode>2', to help the next unaware reader (like, me, in a few
months).

Also, do you know whether we have execution test coverage for
'isnormal([+/-0.0])' (in the generic parts of the GCC test suite), or
should we add something?  (OK to do incrementally.)  I suppose we've got
something as part of 'gcc/testsuite/gcc.dg/c99-math.h',
'gcc/testsuite/gcc.dg/tg-tests.h',
'gcc/testsuite/gcc.dg/torture/floatn-tg.h' -- but these, curiously, all
seem to skip the '-0.0' case?

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/nvptx/isnormal.c
> @@ -0,0 +1,9 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +int isnormal(double x)
> +{
> +  return __builtin_isnormal(x);
> +}
> +
> +/* { dg-final { scan-assembler-times "testp.normal.f64" 1 } } */

For example, here, we could use 'check-function-bodies' to test the whole
PTX instruction sequence instead of just 'testp.normal.f64'.  (But again,
not a requirement; more like something to keep in mind going forward.)


Grüße
 Thomas

Reply via email to