[PATCH v2] RISC-V: Split unordered FP comparisons into individual RTL insns

Maciej W. Rozycki Mon, 04 Jul 2022 07:12:22 -0700

We have unordered FP comparisons implemented as RTL insns that produce 
multiple machine instructions.  Such RTL insns are hard to match with a 
processor pipeline description and additionally there is a redundant 
SNEZ instruction produced on the result of these comparisons even though 
the FLT.fmt and FLE.fmt machine instructions already produce either 0 or 
1, e.g.:


long
flt (double x, double y)
{
  return __builtin_isless (x, y);
}

with `-O2 -fno-finite-math-only -ftrapping-math -fno-signaling-nans' 
gets compiled to:

        .globl  flt
        .type   flt, @function
flt:
        frflags a5
        flt.d   a0,fa0,fa1
        fsflags a5
        snez    a0,a0
        ret
        .size   flt, .-flt

because the middle end can't see through the UNSPEC operation unordered 
FP comparisons have been defined in terms of.

These instructions are only produced via an expander already, so change 
the expander to emit individual RTL insns for each machine instruction 
in the ultimate ultimate sequence produced rather than deferring to a 
single RTL insn producing the whole sequence at once.

        gcc/
        * config/riscv/riscv.md (UNSPECV_FSNVSNAN): New constant.
        (QUIET_PATTERN): New int attribute.
        (f<quiet_pattern>_quiet<ANYF:mode><X:mode>4): Emit the intended 
        RTL insns entirely within the preparation statements.
        (*f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_default)
        (*f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_snan): Remove 
        insns.
        (*riscv_fsnvsnan<mode>2): New insn.

        gcc/testsuite/
        * gcc.target/riscv/fle-ieee.c: New test.
        * gcc.target/riscv/fle-snan.c: New test.
        * gcc.target/riscv/fle.c: New test.
        * gcc.target/riscv/flef-ieee.c: New test.
        * gcc.target/riscv/flef-snan.c: New test.
        * gcc.target/riscv/flef.c: New test.
        * gcc.target/riscv/flt-ieee.c: New test.
        * gcc.target/riscv/flt-snan.c: New test.
        * gcc.target/riscv/flt.c: New test.
        * gcc.target/riscv/fltf-ieee.c: New test.
        * gcc.target/riscv/fltf-snan.c: New test.
        * gcc.target/riscv/fltf.c: New test.
---
Hi Kito,

 Thank you for your review.

> Thanks for detail analysis and performance number report, I am concern
> about this patch might let compiler schedule the fsflags/frflags with
> other floating point instructions, and the major issue is we didn't
> model fflags right in GCC as you mentioned in the first email.

 I have now looked through various places and we're good.

 First of all the C language standard has this:

"F.8.1 Environment management

"IEC 60559 requires that floating-point operations implicitly raise 
floating-point exception status flags, and that rounding control modes can 
be set explicitly to affect result values of floating-point operations. 
When the state for the FENV_ACCESS pragma (defined in <fenv.h>) is "on", 
these changes to the floating-point state are treated as side effects 
which respect sequence points."

We don't actually support the FENV_ACCESS pragma, however we provide for 
having FP environment support in the C library (e.g. `riscv_getflags' and 
`riscv_setflags' among others is the RISC-V port of glibc):

"  * 'The default state for the 'FENV_ACCESS' pragma (C99 and C11
     7.6.1).'

"    This pragma is not implemented, but the default is to "off" unless
     '-frounding-math' is used in which case it is "on"."

(I find this misleading, because my interpretation of our documentation 
and code is that the default is "on" whenever `-frounding-math' and 
`-ftrapping-math' are active both at a time; arguably the text is however 
correct, because `-ftrapping-math' is on by default, so it doesn't have to 
"be used" and it's `-fno-trapping-math' that has to "be unused" for the 
effect to be in place of what FENV_ACCESS "on" would do should we 
implement it).

 Now `riscv_getflags' and `riscv_setflags' use `asm volatile' to peek or 
poke at IEEE exception flags, so for these pieces of code to work the 
compiler has to make sure not to reorder volatile instructions around 
trapping instructions and that is handled in `can_move_insns_across':

          if (may_trap_or_fault_p (PATTERN (insn))
              && (trapping_insns_in_across
                  || other_branch_live != NULL
                  || volatile_insn_p (PATTERN (insn))))
            break;

(cf.: <https://gcc.gnu.org/ml/gcc-patches/2013-01/msg00254.html>, and 
commit c6d851b95a7b).  And we consider FP arithmetic instructions trapping 
under `-ftrapping-math' in `may_trap_p_1':

    case NEG:
    case ABS:
    case SUBREG:
    case VEC_MERGE:
    case VEC_SELECT:
    case VEC_CONCAT:
    case VEC_DUPLICATE:
      /* These operations don't trap even with floating point.  */
      break;

    default:
      /* Any floating arithmetic may trap.  */
      if (FLOAT_MODE_P (GET_MODE (x)) && flag_trapping_math)
        return 1;

(other relevant operations are handled elsewhere with this switch 
statement).  This is consistent with my observations where only FLD or 
FABS.D (neither trapping) get reordered across FRFLAGS (volatile by means 
of `unspec_volatile').

 However following the observations above I chose to update the test cases 
to better reflect how we control IEEE exception handling and use 
`-fno-trapping-math' rather than `-ffinite-math-only' to disable the use 
of unordered comparison operations.

> So I think we should model this right before we split that, I guess
> that would be a bunch of work:
> 1. Add fflags to the hard register list.
> 2. Add (clobber (reg fflags)) or (set (reg fflags) (fpop
> (operands...))) to those floating point operations which might change
> fflags
> 3. Rewrite riscv_frflags and riscv_fsflags pattern by right RTL
> pattern: (set (reg) (reg fflags)) and (set (reg fflags) (reg)).
> 4. Then split *f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_default and
> *f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_snan pattern.
> 
> However I am not sure about the code gen impact of 2, especially the
> impact to the combine pass, not sure if you are interested to give a
> try?

 I think we can look into it long-term as a further optimisation, but to 
solve the problem of insn scheduling at hand my current proposal seems 
adequate enough and I suspect adding full data dependency tracking for the 
IEEE flags could be a rabbit hole (ISTM after all no other target does 
that, and I smell there's been a reason for that).

 FAOD if at all I'd envision doing such tracking individually for each
exception flag, following "Accumulating CSRs" listings from Section 14.3 
"Source and Destination Register Listings" in the unprivileged ISA spec.

 While re-reviewing code I have spotted I previously missed operand 
constraints for the new `*riscv_fsnvsnan<mode>2' insn, so I have added 
them now.

 I have verified that the new test cases still pass with the update in 
place, and I have rerun full `-fsignaling-nans' regression testing for the 
constraint fix.  OK to apply?

  Maciej

Changes from v1:

- Add operand constraints for the new `*riscv_fsnvsnan<mode>2' insn.

- In test cases use `-fno-trapping-math' rather than `-ffinite-math-only'; 
  consequently force `-fno-finite-math-only' so that any defaults do not 
  interfere with the results expected.
---
 gcc/config/riscv/riscv.md                  |   67 +++++++++++++++--------------
 gcc/testsuite/gcc.target/riscv/fle-ieee.c  |   12 +++++
 gcc/testsuite/gcc.target/riscv/fle-snan.c  |   12 +++++
 gcc/testsuite/gcc.target/riscv/fle.c       |   12 +++++
 gcc/testsuite/gcc.target/riscv/flef-ieee.c |   12 +++++
 gcc/testsuite/gcc.target/riscv/flef-snan.c |   12 +++++
 gcc/testsuite/gcc.target/riscv/flef.c      |   12 +++++
 gcc/testsuite/gcc.target/riscv/flt-ieee.c  |   12 +++++
 gcc/testsuite/gcc.target/riscv/flt-snan.c  |   12 +++++
 gcc/testsuite/gcc.target/riscv/flt.c       |   12 +++++
 gcc/testsuite/gcc.target/riscv/fltf-ieee.c |   12 +++++
 gcc/testsuite/gcc.target/riscv/fltf-snan.c |   12 +++++
 gcc/testsuite/gcc.target/riscv/fltf.c      |   12 +++++
 13 files changed, 179 insertions(+), 32 deletions(-)

gcc-riscv-fcmp-split.diff
Index: gcc/gcc/config/riscv/riscv.md
===================================================================
--- gcc.orig/gcc/config/riscv/riscv.md
+++ gcc/gcc/config/riscv/riscv.md
@@ -57,6 +57,7 @@
   ;; Floating-point unspecs.
   UNSPECV_FRFLAGS
   UNSPECV_FSFLAGS
+  UNSPECV_FSNVSNAN
 
   ;; Interrupt handler instructions.
   UNSPECV_MRET
@@ -360,6 +361,7 @@
 ;; Iterator and attributes for quiet comparisons.
 (define_int_iterator QUIET_COMPARISON [UNSPEC_FLT_QUIET UNSPEC_FLE_QUIET])
 (define_int_attr quiet_pattern [(UNSPEC_FLT_QUIET "lt") (UNSPEC_FLE_QUIET 
"le")])
+(define_int_attr QUIET_PATTERN [(UNSPEC_FLT_QUIET "LT") (UNSPEC_FLE_QUIET 
"LE")])
 
 ;; This code iterator allows signed and unsigned widening multiplications
 ;; to use the same template.
@@ -2326,39 +2328,31 @@
    (set_attr "mode" "<UNITMODE>")])
 
 (define_expand "f<quiet_pattern>_quiet<ANYF:mode><X:mode>4"
-   [(parallel [(set (match_operand:X      0 "register_operand")
-                   (unspec:X
-                    [(match_operand:ANYF 1 "register_operand")
-                     (match_operand:ANYF 2 "register_operand")]
-                    QUIET_COMPARISON))
-              (clobber (match_scratch:X 3))])]
-  "TARGET_HARD_FLOAT")
-
-(define_insn "*f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_default"
-   [(set (match_operand:X      0 "register_operand" "=r")
-        (unspec:X
-         [(match_operand:ANYF 1 "register_operand" " f")
-          (match_operand:ANYF 2 "register_operand" " f")]
-         QUIET_COMPARISON))
-    (clobber (match_scratch:X 3 "=&r"))]
-  "TARGET_HARD_FLOAT && ! HONOR_SNANS (<ANYF:MODE>mode)"
-  "frflags\t%3\n\tf<quiet_pattern>.<fmt>\t%0,%1,%2\n\tfsflags\t%3"
-  [(set_attr "type" "fcmp")
-   (set_attr "mode" "<UNITMODE>")
-   (set (attr "length") (const_int 12))])
+   [(set (match_operand:X               0 "register_operand")
+        (unspec:X [(match_operand:ANYF 1 "register_operand")
+                   (match_operand:ANYF 2 "register_operand")]
+                  QUIET_COMPARISON))]
+  "TARGET_HARD_FLOAT"
+{
+  rtx op0 = operands[0];
+  rtx op1 = operands[1];
+  rtx op2 = operands[2];
+  rtx tmp = gen_reg_rtx (SImode);
+  rtx cmp = gen_rtx_<QUIET_PATTERN> (<X:MODE>mode, op1, op2);
+  rtx frflags = gen_rtx_UNSPEC_VOLATILE (SImode, gen_rtvec (1, const0_rtx),
+                                        UNSPECV_FRFLAGS);
+  rtx fsflags = gen_rtx_UNSPEC_VOLATILE (SImode, gen_rtvec (1, tmp),
+                                        UNSPECV_FSFLAGS);
 
-(define_insn "*f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_snan"
-   [(set (match_operand:X      0 "register_operand" "=r")
-        (unspec:X
-         [(match_operand:ANYF 1 "register_operand" " f")
-          (match_operand:ANYF 2 "register_operand" " f")]
-         QUIET_COMPARISON))
-    (clobber (match_scratch:X 3 "=&r"))]
-  "TARGET_HARD_FLOAT && HONOR_SNANS (<ANYF:MODE>mode)"
-  
"frflags\t%3\n\tf<quiet_pattern>.<fmt>\t%0,%1,%2\n\tfsflags\t%3\n\tfeq.<fmt>\tzero,%1,%2"
-  [(set_attr "type" "fcmp")
-   (set_attr "mode" "<UNITMODE>")
-   (set (attr "length") (const_int 16))])
+  emit_insn (gen_rtx_SET (tmp, frflags));
+  emit_insn (gen_rtx_SET (op0, cmp));
+  emit_insn (fsflags);
+  if (HONOR_SNANS (<ANYF:MODE>mode))
+    emit_insn (gen_rtx_UNSPEC_VOLATILE (<ANYF:MODE>mode,
+                                       gen_rtvec (2, op1, op2),
+                                       UNSPECV_FSNVSNAN));
+  DONE;
+})
 
 (define_insn "*seq_zero_<X:mode><GPR:mode>"
   [(set (match_operand:GPR       0 "register_operand" "=r")
@@ -2766,6 +2760,15 @@
   "TARGET_HARD_FLOAT"
   "fsflags\t%0")
 
+(define_insn "*riscv_fsnvsnan<mode>2"
+  [(unspec_volatile [(match_operand:ANYF 0 "register_operand" "f")
+                    (match_operand:ANYF 1 "register_operand" "f")]
+                   UNSPECV_FSNVSNAN)]
+  "TARGET_HARD_FLOAT"
+  "feq.<fmt>\tzero,%0,%1"
+  [(set_attr "type" "fcmp")
+   (set_attr "mode" "<UNITMODE>")])
+
 (define_insn "riscv_mret"
   [(return)
    (unspec_volatile [(const_int 0)] UNSPECV_MRET)]
Index: gcc/gcc/testsuite/gcc.target/riscv/fle-ieee.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/fle-ieee.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-fno-finite-math-only -ftrapping-math -fno-signaling-nans" } 
*/
+
+long
+fle (double x, double y)
+{
+  return __builtin_islessequal (x, y);
+}
+
+/* { dg-final { scan-assembler 
"\tfrflags\t(\[^\n\]*)\n\tfle\\.d\t\[^\n\]*\n\tfsflags\t\\1\n" } } */
+/* { dg-final { scan-assembler-not "snez" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/fle-snan.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/fle-snan.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-fno-finite-math-only -ftrapping-math -fsignaling-nans" } */
+
+long
+fle (double x, double y)
+{
+  return __builtin_islessequal (x, y);
+}
+
+/* { dg-final { scan-assembler 
"\tfrflags\t(\[^\n\]*)\n\tfle\\.d\t\[^,\]*,(\[^,\]*),(\[^,\]*)\n\tfsflags\t\\1\n\tfeq\\.d\tzero,\\2,\\3\n"
 } } */
+/* { dg-final { scan-assembler-not "snez" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/fle.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/fle.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-fno-finite-math-only -fno-trapping-math -fno-signaling-nans" 
} */
+
+long
+fle (double x, double y)
+{
+  return __builtin_islessequal (x, y);
+}
+
+/* { dg-final { scan-assembler "\tf(?:gt|le)\\.d\t\[^\n\]*\n" } } */
+/* { dg-final { scan-assembler-not "f\[rs\]flags" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/flef-ieee.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/flef-ieee.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-fno-finite-math-only -ftrapping-math -fno-signaling-nans" } 
*/
+
+long
+flef (float x, float y)
+{
+  return __builtin_islessequal (x, y);
+}
+
+/* { dg-final { scan-assembler 
"\tfrflags\t(\[^\n\]*)\n\tfle\\.s\t\[^\n\]*\n\tfsflags\t\\1\n" } } */
+/* { dg-final { scan-assembler-not "snez" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/flef-snan.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/flef-snan.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-fno-finite-math-only -ftrapping-math -fsignaling-nans" } */
+
+long
+flef (float x, float y)
+{
+  return __builtin_islessequal (x, y);
+}
+
+/* { dg-final { scan-assembler 
"\tfrflags\t(\[^\n\]*)\n\tfle\\.s\t\[^,\]*,(\[^,\]*),(\[^,\]*)\n\tfsflags\t\\1\n\tfeq\\.s\tzero,\\2,\\3\n"
 } } */
+/* { dg-final { scan-assembler-not "snez" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/flef.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/flef.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-fno-finite-math-only -fno-trapping-math -fno-signaling-nans" 
} */
+
+long
+flef (float x, float y)
+{
+  return __builtin_islessequal (x, y);
+}
+
+/* { dg-final { scan-assembler "\tf(?:gt|le)\\.s\t\[^\n\]*\n" } } */
+/* { dg-final { scan-assembler-not "f\[rs\]flags" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/flt-ieee.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/flt-ieee.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-fno-finite-math-only -ftrapping-math -fno-signaling-nans" } 
*/
+
+long
+flt (double x, double y)
+{
+  return __builtin_isless (x, y);
+}
+
+/* { dg-final { scan-assembler 
"\tfrflags\t(\[^\n\]*)\n\tflt\\.d\t\[^\n\]*\n\tfsflags\t\\1\n" } } */
+/* { dg-final { scan-assembler-not "snez" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/flt-snan.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/flt-snan.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-fno-finite-math-only -ftrapping-math -fsignaling-nans" } */
+
+long
+flt (double x, double y)
+{
+  return __builtin_isless (x, y);
+}
+
+/* { dg-final { scan-assembler 
"\tfrflags\t(\[^\n\]*)\n\tflt\\.d\t\[^,\]*,(\[^,\]*),(\[^,\]*)\n\tfsflags\t\\1\n\tfeq\\.d\tzero,\\2,\\3\n"
 } } */
+/* { dg-final { scan-assembler-not "snez" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/flt.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/flt.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-fno-finite-math-only -fno-trapping-math -fno-signaling-nans" 
} */
+
+long
+flt (double x, double y)
+{
+  return __builtin_isless (x, y);
+}
+
+/* { dg-final { scan-assembler "\tf(?:ge|lt)\\.d\t\[^\n\]*\n" } } */
+/* { dg-final { scan-assembler-not "f\[rs\]flags" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/fltf-ieee.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/fltf-ieee.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-fno-finite-math-only -ftrapping-math -fno-signaling-nans" } 
*/
+
+long
+fltf (float x, float y)
+{
+  return __builtin_isless (x, y);
+}
+
+/* { dg-final { scan-assembler 
"\tfrflags\t(\[^\n\]*)\n\tflt\\.s\t\[^\n\]*\n\tfsflags\t\\1\n" } } */
+/* { dg-final { scan-assembler-not "snez" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/fltf-snan.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/fltf-snan.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-fno-finite-math-only -ftrapping-math -fsignaling-nans" } */
+
+long
+fltf (float x, float y)
+{
+  return __builtin_isless (x, y);
+}
+
+/* { dg-final { scan-assembler 
"\tfrflags\t(\[^\n\]*)\n\tflt\\.s\t\[^,\]*,(\[^,\]*),(\[^,\]*)\n\tfsflags\t\\1\n\tfeq\\.s\tzero,\\2,\\3\n"
 } } */
+/* { dg-final { scan-assembler-not "snez" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/fltf.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/fltf.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-fno-finite-math-only -fno-trapping-math -fno-signaling-nans" 
} */
+
+long
+fltf (float x, float y)
+{
+  return __builtin_isless (x, y);
+}
+
+/* { dg-final { scan-assembler "\tf(?:ge|lt)\\.s\t\[^\n\]*\n" } } */
+/* { dg-final { scan-assembler-not "f\[rs\]flags" } } */

[PATCH v2] RISC-V: Split unordered FP comparisons into individual RTL insns

Reply via email to