Hi Carl,

Excepting for Peter's review comments, some nits are inline below.

on 2023/7/11 03:18, Carl Love wrote:
> 
> GCC maintainers:
> 
> Ver 3, Renamed the patch per comments on ver 2.  Previous subject line
> was " [PATCH ver 2] rs6000, __builtin_set_fpscr_rn add retrun value".  
> Fixed spelling mistakes and formatting.  Updated define_expand
> "rs6000_set_fpscr_rn to have the rs6000_get_fpscr_fields and
> rs6000_update_fpscr_rn_field define expands inlined.  Optimized the
> code and fixed use of temporary register values. Updated the test file
> dg-do run arguments and dg-options.  Removed the check for
> __SET_FPSCR_RN_RETURNS_FPSCR__. Removed additional references to the
> overloaded built-in with double argument.  Fixed up the documentation
> file.  Updated patch retested on Power 8 BE/LE, Power 9 BE/LE and Power
> 10 LE.
> 
> Ver 2,  Went back thru the requirements and emails.  Not sure where I
> came up with the requirement for an overloaded version with double
> argument.  Removed the overloaded version with the double argument. 
> Added the macro to announce if the __builtin_set_fpscr_rn returns a
> void or a double with the FPSCR bits.  Updated the documentation file. 
> Retested on Power 8 BE/LE, Power 9 BE/LE, Power 10 LE.  Redid the test
> file.  Per request, the original test file functionality was not
> changed.  Just changed the name from test_fpscr_rn_builtin.c to 
> test_fpscr_rn_builtin_1.c.  Put new tests for the return values into a
> new test file, test_fpscr_rn_builtin_2.c.
> 
> The GLibC team requested a builtin to replace the mffscrn and
> mffscrniinline asm instructions in the GLibC code.  Previously there
> was discussion on adding builtins for the mffscrn instructions.
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620261.html
> 
> In the end, it was felt that it would be to extend the existing
> __builtin_set_fpscr_rn builtin to return a double instead of a void
> type.  The desire is that we could have the functionality of the
> mffscrn and mffscrni instructions on older ISAs.  The two instructions
> were initially added in ISA 3.0.  The __builtin_set_fpscr_rn has the
> needed functionality to set the RN field using the mffscrn and mffscrni
> instructions if ISA 3.0 is supported or fall back to using logical
> instructions to mask and set the bits for earlier ISAs.  The
> instructions return the current value of the FPSCR fields DRN, VE, OE,
> UE, ZE, XE, NI, RN bit positions then update the RN bit positions with
> the new RN value provided.
> 
> The current __builtin_set_fpscr_rn builtin has a return type of void. 
> So, changing the return type to double and returning the  FPSCR fields
> DRN, VE, OE, UE, ZE, XE, NI, RN bit positions would then give the
> functionally equivalent of the mffscrn and mffscrni instructions.  Any
> current uses of the builtin would just ignore the return value yet any
> new uses could use the return value.  So the requirement is for the
> change to the __builtin_set_fpscr_rn builtin to be backwardly
> compatible and work for all ISAs.
> 
> The following patch changes the return type of the
>  __builtin_set_fpscr_rn builtin from void to double.  The return value
> is the current value of the various FPSCR fields DRN, VE, OE, UE, ZE,
> XE, NI, RN bit positions when the builtin is called.  The builtin then
> updated the RN field with the new value provided as an argument to the
> builtin.  The patch adds new testcases to test_fpscr_rn_builtin.c to
> check that the builtin returns the current value of the FPSCR fields
> and then updates the RN field.
> 
> The GLibC team has reviewed the patch to make sure it met their needs
> as a drop in replacement for the inline asm mffscr and mffscrni
> statements in the GLibC code.  T
> 
> The patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power 10
> LE.
> 
> Please let me know if the patch is acceptable for mainline.  Thanks.
> 
>                                Carl 
> 
> 
> 
> -------------------------------------------------
> rs6000, Add return value  to __builtin_set_fpscr_rn

Nit: One more unexpected space.

> 
> Change the return value from void to double for __builtin_set_fpscr_rn.
> The return value consists of the FPSCR fields DRN, VE, OE, UE, ZE, XE, NI,
> RN bit positions.  A new test file, test powerpc/test_fpscr_rn_builtin_2.c,
> is added to test the new return value for the built-in.

Nit: It would be better to note the newly added __SET_FPSCR_RN_RETURNS_FPSCR__
in commit log as well.

> 
> gcc/ChangeLog:
>       * config/rs6000/rs6000-builtins.def (__builtin_set_fpscr_rn): Update
>       built-in definition return type.
>       * config/rs6000-c.cc (rs6000_target_modify_macros): Add check,
>       define __SET_FPSCR_RN_RETURNS_FPSCR__ macro.
>       * config/rs6000/rs6000.md (rs6000_set_fpscr_rn): Added return

Nit: s/Added/Add/

>       argument to return FPSCR fields.
>       * doc/extend.texi (__builtin_set_fpscr_rn): Update description for
>       the return value.  Add description for
>       __SET_FPSCR_RN_RETURNS_FPSCR__ macro.
> 
> gcc/testsuite/ChangeLog:
>       gcc.target/powerpc/test_fpscr_rn_builtin.c: Renamed to
>       test_fpscr_rn_builtin_1.c.  Added comment.

Nit: s/Added/Add/ and s/Renamed/Rename/.

>       gcc.target/powerpc/test_fpscr_rn_builtin_2.c: New test for the
>       return value of __builtin_set_fpscr_rn builtin.
> ---
>  gcc/config/rs6000/rs6000-builtins.def         |   2 +-
>  gcc/config/rs6000/rs6000-c.cc                 |   5 +
>  gcc/config/rs6000/rs6000.md                   |  56 ++++---
>  gcc/doc/extend.texi                           |  23 ++-
>  ...rn_builtin.c => test_fpscr_rn_builtin_1.c} |  10 +-
>  .../powerpc/test_fpscr_rn_builtin_2.c         | 151 ++++++++++++++++++
>  6 files changed, 215 insertions(+), 32 deletions(-)
>  rename gcc/testsuite/gcc.target/powerpc/{test_fpscr_rn_builtin.c => 
> test_fpscr_rn_builtin_1.c} (89%)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/test_fpscr_rn_builtin_2.c
> 
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index 289a37998b1..28788b69c7d 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -237,7 +237,7 @@
>    const __ibm128 __builtin_pack_ibm128 (double, double);
>      PACK_IF packif {ibm128}
>  
> -  void __builtin_set_fpscr_rn (const int[0,3]);
> +  double __builtin_set_fpscr_rn (const int[0,3]);
>      SET_FPSCR_RN rs6000_set_fpscr_rn {nosoft}
>  
>    const double __builtin_unpack_ibm128 (__ibm128, const int<1>);
> diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
> index 8555174d36e..07a32badb54 100644
> --- a/gcc/config/rs6000/rs6000-c.cc
> +++ b/gcc/config/rs6000/rs6000-c.cc
> @@ -604,6 +604,11 @@ rs6000_target_modify_macros (bool define_p, 
> HOST_WIDE_INT flags)
>    /* Tell the user -mrop-protect is in play.  */
>    if (rs6000_rop_protect)
>      rs6000_define_or_undefine_macro (define_p, "__ROP_PROTECT__");
> +  /* Tell the user __builtin_set_fpscr_rn now returns the FPSCR fields
> +     in a double.  Originally the built-in returned void.  */
> +  if ((flags & OPTION_MASK_SOFT_FLOAT) == 0)
> +    rs6000_define_or_undefine_macro (define_p,
> +                                  "__SET_FPSCR_RN_RETURNS_FPSCR__");
>  }
>  
>  void
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index b0db8ae508d..1c1aab1af56 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -6441,7 +6441,8 @@
>    [(set_attr "type" "fp")])
>  
>  (define_expand "rs6000_set_fpscr_rn"
> - [(match_operand:SI 0 "reg_or_cint_operand")]
> + [(use (match_operand:DF 0 "gpc_reg_operand"))
> +  (use (match_operand:SI 1 "reg_or_cint_operand"))]
>    "TARGET_HARD_FLOAT"
>  {
>    rtx tmp_df = gen_reg_rtx (DFmode);
> @@ -6450,25 +6451,41 @@
>       new rounding mode bits from operands[0][62:63] into FPSCR[62:63].  */
>    if (TARGET_P9_MISC)
>      {
> -      if (const_0_to_3_operand (operands[0], VOIDmode))
> -     emit_insn (gen_rs6000_mffscrni (tmp_df, operands[0]));
> +      if (const_0_to_3_operand (operands[1], VOIDmode))
> +     emit_insn (gen_rs6000_mffscrni (tmp_df, operands[1]));
>        else
>       {
> -       rtx op0 = convert_to_mode (DImode, operands[0], false);
> -       rtx src_df = simplify_gen_subreg (DFmode, op0, DImode, 0);
> +       rtx op1 = convert_to_mode (DImode, operands[1], false);
> +       rtx src_df = simplify_gen_subreg (DFmode, op1, DImode, 0);
>         emit_insn (gen_rs6000_mffscrn (tmp_df, src_df));
>       }
> -      DONE;
> +       emit_move_insn (operands[0], tmp_df);
> +       DONE;
>      }
>  
> -  if (CONST_INT_P (operands[0]))
> +  /* Emulate the behavior of the mffscrni, mffscrn instructions for earlier
> +     ISAs.  Return bits 29:31 (DRN) and bits 56:63 (VE, OE, UE, ZE, XE, NI,
> +     RN) from the FPSCR.  Set the RN field based on the value in operands[1].
> +  */
> +
> +  /* Get the current FPSCR fields, bits 29:31 (DRN) and bits 56:63 (VE, OE, 
> UE,
> +  ZE, XE, NI, RN) from the FPSCR and return them.  */
> +  rtx tmp_di1 = gen_reg_rtx (DImode);

Nit: This line is preferred to be move to below (a), close to its use.

> +
> +  emit_insn (gen_rs6000_mffs (tmp_df));
> +  rtx tmp_di2 = simplify_gen_subreg (DImode, tmp_df, DFmode, 0);

Nit: May be good to rename this tmp_di2 as orig_df_in_di, hope it can
offer better readablity when people read the code below with its use.

... (a)

> +  emit_insn (gen_anddi3 (tmp_di1, tmp_di2, GEN_INT (0x00000007000000FFULL)));
> +  rtx tmp_rtn = simplify_gen_subreg (DFmode, tmp_di1, DImode, 0);
> +  emit_move_insn (operands[0], tmp_rtn);
> +
> +  if (CONST_INT_P (operands[1]))
>      {
> -      if ((INTVAL (operands[0]) & 0x1) == 0x1)
> +      if ((INTVAL (operands[1]) & 0x1) == 0x1)
>       emit_insn (gen_rs6000_mtfsb1 (GEN_INT (31)));
>        else
>       emit_insn (gen_rs6000_mtfsb0 (GEN_INT (31)));
>  
> -      if ((INTVAL (operands[0]) & 0x2) == 0x2)
> +      if ((INTVAL (operands[1]) & 0x2) == 0x2)
>       emit_insn (gen_rs6000_mtfsb1 (GEN_INT (30)));
>        else
>       emit_insn (gen_rs6000_mtfsb0 (GEN_INT (30)));
> @@ -6476,23 +6493,20 @@
>    else
>      {
>        rtx tmp_rn = gen_reg_rtx (DImode);
> -      rtx tmp_di = gen_reg_rtx (DImode);
>  
>        /* Extract new RN mode from operand.  */
> -      rtx op0 = convert_to_mode (DImode, operands[0], false);
> -      emit_insn (gen_anddi3 (tmp_rn, op0, GEN_INT (3)));
> +      rtx op1 = convert_to_mode (DImode, operands[1], false);
> +      emit_insn (gen_anddi3 (tmp_rn, op1, GEN_INT (3)));
>  
> -      /* Insert new RN mode into FSCPR.  */
> -      emit_insn (gen_rs6000_mffs (tmp_df));
> -      tmp_di = simplify_gen_subreg (DImode, tmp_df, DFmode, 0);
> -      emit_insn (gen_anddi3 (tmp_di, tmp_di, GEN_INT (-4)));
> -      emit_insn (gen_iordi3 (tmp_di, tmp_di, tmp_rn));
> +      /* Insert the new RN value from tmp_rn into FPSCR bit [62:63].  */
> +      emit_insn (gen_anddi3 (tmp_di1, tmp_di2, GEN_INT (-4)));
> +      emit_insn (gen_iordi3 (tmp_di1, tmp_di1, tmp_rn));
>  
>        /* Need to write to field k=15.  The fields are [0:15].  Hence with
> -      L=0, W=0, FLM_i must be equal to 8, 16 = i + 8*(1-W).  FLM is an
> -      8-bit field[0:7]. Need to set the bit that corresponds to the
> -      value of i that you want [0:7].  */
> -      tmp_df = simplify_gen_subreg (DFmode, tmp_di, DImode, 0);
> +         L=0, W=0, FLM_i must be equal to 8, 16 = i + 8*(1-W).  FLM is an
> +         8-bit field[0:7]. Need to set the bit that corresponds to the
> +         value of i that you want [0:7].  */

Nit: Wrong indent change for this comment.  Previous tabs are expected.

> +      tmp_df = simplify_gen_subreg (DFmode, tmp_di1, DImode, 0);
>        emit_insn (gen_rs6000_mtfsf (GEN_INT (0x01), tmp_df));
>      }
>    DONE;
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index cdbd4b34a35..8ff34c33d8e 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -18188,7 +18188,7 @@ double __builtin_mffs (void);
>  void __builtin_mtfsf (const int, double);
>  void __builtin_mtfsb0 (const int);
>  void __builtin_mtfsb1 (const int);
> -void __builtin_set_fpscr_rn (int);
> +double __builtin_set_fpscr_rn (int);
>  @end smallexample
>  
>  The @code{__builtin_ppc_get_timebase} and @code{__builtin_ppc_mftb}
> @@ -18209,13 +18209,20 @@ values to selected fields of the FPSCR.  The
>  as an argument.  The valid bit range is between 0 and 31.  The builtins map 
> to
>  the @code{mtfsb0} and @code{mtfsb1} instructions which take the argument and
>  add 32.  Hence these instructions only modify the FPSCR[32:63] bits by
> -changing the specified bit to a zero or one respectively.  The
> -@code{__builtin_set_fpscr_rn} builtin allows changing both of the floating
> -point rounding mode bits.  The argument is a 2-bit value.  The argument can
> -either be a @code{const int} or stored in a variable. The builtin uses
> -the ISA 3.0
> -instruction @code{mffscrn} if available, otherwise it reads the FPSCR, masks
> -the current rounding mode bits out and OR's in the new value.
> +changing the specified bit to a zero or one respectively.
> +
> +The @code{__builtin_set_fpscr_rn} built-in allows changing both of the 
> floating
> +point rounding mode bits and returning the various FPSCR fields before the RN
> +field is updated.  The built-in returns a double consisting of the initial
> +value of the FPSCR fields DRN, VE, OE, UE, ZE, XE, NI, and RN bit positions
> +with all other bits set to zero.  The built-in argument is a 2-bit value for 
> the
> +new RN field value.  The argument can either be an @code{const int} or stored
> +in a variable.  Earlier versions of @code{__builtin_set_fpscr_rn} returned
> +void.  A @code{__SET_FPSCR_RN_RETURNS_FPSCR__} macro has been added.  If
> +defined, then the @code{__builtin_set_fpscr_rn} built-in returns the FPSCR
> +fields.  If not defined, the @code{__builtin_set_fpscr_rn} does not return a
> +vaule.  If the @option{-msoft-float} option is used, the
> +@code{__builtin_set_fpscr_rn} built-in will not return a value.
>  
>  @node Basic PowerPC Built-in Functions Available on ISA 2.05
>  @subsubsection Basic PowerPC Built-in Functions Available on ISA 2.05
> diff --git a/gcc/testsuite/gcc.target/powerpc/test_fpscr_rn_builtin.c 
> b/gcc/testsuite/gcc.target/powerpc/test_fpscr_rn_builtin_1.c
> similarity index 89%
> rename from gcc/testsuite/gcc.target/powerpc/test_fpscr_rn_builtin.c
> rename to gcc/testsuite/gcc.target/powerpc/test_fpscr_rn_builtin_1.c
> index 04707ad8a56..3e22b4e511e 100644
> --- a/gcc/testsuite/gcc.target/powerpc/test_fpscr_rn_builtin.c
> +++ b/gcc/testsuite/gcc.target/powerpc/test_fpscr_rn_builtin_1.c
> @@ -1,6 +1,12 @@
> -/* { dg-do run { target { powerpc*-*-* } } } */
> +/* { dg-do run { target powerpc_fprs } } */
>  /* { dg-options "-O2 -std=c99" } */

Nit: Since you remove -std=c99 from _2.c, pls. remove it here to keep 
consistent.

>  
> +/* Originally __builtin_set_fpscr_rn was defined to return void.  It was
> +   later extended to return a double with the various FPSCR bits.  The
> +   extended built-in is intended to be a drop in replacement for the original
> +   version.  This test is for the original version of the built-in and should
> +   work exactly as before.  */
> +
>  #ifdef DEBUG
>  #include <stdio.h>
>  #endif
> @@ -26,7 +32,7 @@ int main ()
>    unsigned long long ll_value;
>    register double  f14;
>  
> -  /* __builtin_set_fpscr_rn() builtin can take a const or a variable
> +  /* __builtin_set_fpscr_rn() built-in can take a const or a variable
>       value between 0 and 3 as the argument.
>       __builtin_mtfsb0 and __builtin_mtfsb1 argument must be a constant 
>       30 or 31.
> diff --git a/gcc/testsuite/gcc.target/powerpc/test_fpscr_rn_builtin_2.c 
> b/gcc/testsuite/gcc.target/powerpc/test_fpscr_rn_builtin_2.c
> new file mode 100644
> index 00000000000..861b4a8d300
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/test_fpscr_rn_builtin_2.c
> @@ -0,0 +1,151 @@
> +/* { dg-do run { target powerpc_fprs } } */
> +/* { dg-options "-O2" } */
> +
> +/* The __builtin_set_fpscr_rn built-in was originally defined to return
> +   void.  The built-in now returns a double containing the various FPSCR 
> bits.
> +   This test verifies the new return value.  */
> +
> +#ifdef DEBUG
> +#include <stdio.h>
> +#endif
> +
> +#define RN_MASK  0x3LL             /* RN field mask */
> +#define FIELD_MASK 0x00000007000000FFULL
> +
> +union blah {
> +  double d;
> +  unsigned long long ll;
> +} conv_val;
> +
> +void abort (void);
> +
> +double __attribute__ ((noipa)) wrap_set_fpscr_rn (int val)
> +{
> +  return __builtin_set_fpscr_rn (val);
> +}
> +
> +double __attribute__ ((noipa)) wrap_const_fpscr_rn (int val)
> +{
> +  switch (val)
> +    {
> +    case 0: return __builtin_set_fpscr_rn (0x0);
> +    case 1: return __builtin_set_fpscr_rn (0x1);
> +    case 2: return __builtin_set_fpscr_rn (0x2);
> +    case 3: return __builtin_set_fpscr_rn (0x3);
> +    }
> +}
> +
> +void check_builtin_set_fpscr_rn (unsigned long long initial_fpscr,
> +                              int new_RN, double result)
> +{
> +  register double  f14;
> +  unsigned long long masked_fpscr = initial_fpscr & FIELD_MASK;
> +  
> +  conv_val.d = result;
> +
> +  /* Check the result.  */
> +  if (conv_val.ll != masked_fpscr)
> +    {
> +#ifdef DEBUG
> +       printf("ERROR, __builtin_set_fpscr_rn(%d) did not return expected 
> value %llx.\n",
> +           new_RN, masked_fpscr);
> +       printf("fpscr_val_initial = 0x%llx\n", initial_fpscr);       
> +       printf("result = 0x%llx\n", conv_val.ll);
> +#else
> +       abort();
> +#endif
> +    }
> +
> +  /* Check to see if the RN field was updated.  */
> +    __asm __volatile ("mffs %0" : "=f"(f14));
> +  conv_val.d = f14;
> +
> +  if ((conv_val.ll & RN_MASK) != new_RN)
> +#ifdef DEBUG
> +    {
> +      printf("ERROR,  __builtin_set_fpscr_rn(%d) did not update RN to 
> %llx.\n",
> +          new_RN, new_RN);
> +      printf("  conv_val.ll = 0x%llx\n", conv_val.ll);
> +    }
> +#else
> +    abort();
> +#endif
> +}
> +
> +int
> +main ()
> +{
> +  int i;
> +  int val, bit;
> +  double fpscr_val;
> +  unsigned long long fpscr_val_initial;
> +  
> +  unsigned long long ll_value;
> +  union blah src_double;
> +  register double  f14;
> +
> +  /* If __SET_FPSCR_RN_RETURNS_FPSCR__ is defined, the 
> __builtin_set_fpscr_rn()
> +     builtin returns the FPSCR fields.*/
> +  
> +  /* __builtin_set_fpscr_rn() builtin can take a constant or a variable
> +     value between 0 and 3 as the argument.
> +     __builtin_mtfsb0 and __builtin_mtfsb1 argument must be a constant 
> +     30 or 31.
> +  */
> +
> +  /* Test reading the FPSCR register */
> +  __asm __volatile ("mffs %0" : "=f"(f14));
> +  conv_val.d = f14;
> +
> +  if (conv_val.d != __builtin_mffs())
> +    {
> +#ifdef DEBUG
> +       printf("ERROR, __builtin_mffs() returned 0x%llx, not the expected 
> value 0x%llx\n",
> +           __builtin_mffs(), conv_val.d);
> +#else
> +       abort();
> +#endif
> +    }                  
> +
> +  /* Test return value from __builtin_set_fpscr_rn. The FPSCR fields (DRN, 
> VE,
> +     OE, UE, ZE, XE, NI, RN) are returned and  the RN field of FPSCR is 
> updated
> +     with the specified argument for the built-in.  */
> +
> +  /* Check immediate argument cases */
> +  __asm __volatile ("mffs %0" : "=f"(f14));
> +  conv_val.d = f14;
> +  fpscr_val_initial = conv_val.ll;
> +
> +  val = 0x0;
> +  fpscr_val = wrap_const_fpscr_rn (val);
> +  check_builtin_set_fpscr_rn (fpscr_val_initial, val, fpscr_val);
> +
> +  __asm __volatile ("mffs %0" : "=f"(f14));
> +  conv_val.d = f14;
> +  fpscr_val_initial = conv_val.ll;
> +
> +  val = 0x3;
> +  fpscr_val = wrap_const_fpscr_rn (val);
> +  check_builtin_set_fpscr_rn (fpscr_val_initial, val, fpscr_val);
> +  
> +  /* Check int argument cases */
> +  __asm __volatile ("mffs %0" : "=f"(f14));
> +  conv_val.d = f14;
> +  fpscr_val_initial = conv_val.ll;
> +
> +  val = 0x1;
> +  fpscr_val = wrap_set_fpscr_rn (val);
> +  check_builtin_set_fpscr_rn (fpscr_val_initial, val, fpscr_val);
> +
> +  __asm __volatile ("mffs %0" : "=f"(f14));
> +  conv_val.d = f14;
> +  fpscr_val_initial = conv_val.ll;
> +
> +  val = 0x2;
> +  fpscr_val = wrap_set_fpscr_rn (val);
> +  check_builtin_set_fpscr_rn (fpscr_val_initial, val, fpscr_val);
> +  return 0;
> +
> +  /* The __SET_FPSCR_RN_RETURNS_FPSCR__ should be defined.  */
> +  abort();

Nit: These two lines are useless (unreachable), please remove them.

BR,
Kewen

Reply via email to