Hi Carl, Excepting for Peter's review comments, some nits are inline below.
on 2023/7/11 03:18, Carl Love wrote: > > GCC maintainers: > > Ver 3, Renamed the patch per comments on ver 2. Previous subject line > was " [PATCH ver 2] rs6000, __builtin_set_fpscr_rn add retrun value". > Fixed spelling mistakes and formatting. Updated define_expand > "rs6000_set_fpscr_rn to have the rs6000_get_fpscr_fields and > rs6000_update_fpscr_rn_field define expands inlined. Optimized the > code and fixed use of temporary register values. Updated the test file > dg-do run arguments and dg-options. Removed the check for > __SET_FPSCR_RN_RETURNS_FPSCR__. Removed additional references to the > overloaded built-in with double argument. Fixed up the documentation > file. Updated patch retested on Power 8 BE/LE, Power 9 BE/LE and Power > 10 LE. > > Ver 2, Went back thru the requirements and emails. Not sure where I > came up with the requirement for an overloaded version with double > argument. Removed the overloaded version with the double argument. > Added the macro to announce if the __builtin_set_fpscr_rn returns a > void or a double with the FPSCR bits. Updated the documentation file. > Retested on Power 8 BE/LE, Power 9 BE/LE, Power 10 LE. Redid the test > file. Per request, the original test file functionality was not > changed. Just changed the name from test_fpscr_rn_builtin.c to > test_fpscr_rn_builtin_1.c. Put new tests for the return values into a > new test file, test_fpscr_rn_builtin_2.c. > > The GLibC team requested a builtin to replace the mffscrn and > mffscrniinline asm instructions in the GLibC code. Previously there > was discussion on adding builtins for the mffscrn instructions. > > https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620261.html > > In the end, it was felt that it would be to extend the existing > __builtin_set_fpscr_rn builtin to return a double instead of a void > type. The desire is that we could have the functionality of the > mffscrn and mffscrni instructions on older ISAs. The two instructions > were initially added in ISA 3.0. The __builtin_set_fpscr_rn has the > needed functionality to set the RN field using the mffscrn and mffscrni > instructions if ISA 3.0 is supported or fall back to using logical > instructions to mask and set the bits for earlier ISAs. The > instructions return the current value of the FPSCR fields DRN, VE, OE, > UE, ZE, XE, NI, RN bit positions then update the RN bit positions with > the new RN value provided. > > The current __builtin_set_fpscr_rn builtin has a return type of void. > So, changing the return type to double and returning the FPSCR fields > DRN, VE, OE, UE, ZE, XE, NI, RN bit positions would then give the > functionally equivalent of the mffscrn and mffscrni instructions. Any > current uses of the builtin would just ignore the return value yet any > new uses could use the return value. So the requirement is for the > change to the __builtin_set_fpscr_rn builtin to be backwardly > compatible and work for all ISAs. > > The following patch changes the return type of the > __builtin_set_fpscr_rn builtin from void to double. The return value > is the current value of the various FPSCR fields DRN, VE, OE, UE, ZE, > XE, NI, RN bit positions when the builtin is called. The builtin then > updated the RN field with the new value provided as an argument to the > builtin. The patch adds new testcases to test_fpscr_rn_builtin.c to > check that the builtin returns the current value of the FPSCR fields > and then updates the RN field. > > The GLibC team has reviewed the patch to make sure it met their needs > as a drop in replacement for the inline asm mffscr and mffscrni > statements in the GLibC code. T > > The patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power 10 > LE. > > Please let me know if the patch is acceptable for mainline. Thanks. > > Carl > > > > ------------------------------------------------- > rs6000, Add return value to __builtin_set_fpscr_rn Nit: One more unexpected space. > > Change the return value from void to double for __builtin_set_fpscr_rn. > The return value consists of the FPSCR fields DRN, VE, OE, UE, ZE, XE, NI, > RN bit positions. A new test file, test powerpc/test_fpscr_rn_builtin_2.c, > is added to test the new return value for the built-in. Nit: It would be better to note the newly added __SET_FPSCR_RN_RETURNS_FPSCR__ in commit log as well. > > gcc/ChangeLog: > * config/rs6000/rs6000-builtins.def (__builtin_set_fpscr_rn): Update > built-in definition return type. > * config/rs6000-c.cc (rs6000_target_modify_macros): Add check, > define __SET_FPSCR_RN_RETURNS_FPSCR__ macro. > * config/rs6000/rs6000.md (rs6000_set_fpscr_rn): Added return Nit: s/Added/Add/ > argument to return FPSCR fields. > * doc/extend.texi (__builtin_set_fpscr_rn): Update description for > the return value. Add description for > __SET_FPSCR_RN_RETURNS_FPSCR__ macro. > > gcc/testsuite/ChangeLog: > gcc.target/powerpc/test_fpscr_rn_builtin.c: Renamed to > test_fpscr_rn_builtin_1.c. Added comment. Nit: s/Added/Add/ and s/Renamed/Rename/. > gcc.target/powerpc/test_fpscr_rn_builtin_2.c: New test for the > return value of __builtin_set_fpscr_rn builtin. > --- > gcc/config/rs6000/rs6000-builtins.def | 2 +- > gcc/config/rs6000/rs6000-c.cc | 5 + > gcc/config/rs6000/rs6000.md | 56 ++++--- > gcc/doc/extend.texi | 23 ++- > ...rn_builtin.c => test_fpscr_rn_builtin_1.c} | 10 +- > .../powerpc/test_fpscr_rn_builtin_2.c | 151 ++++++++++++++++++ > 6 files changed, 215 insertions(+), 32 deletions(-) > rename gcc/testsuite/gcc.target/powerpc/{test_fpscr_rn_builtin.c => > test_fpscr_rn_builtin_1.c} (89%) > create mode 100644 gcc/testsuite/gcc.target/powerpc/test_fpscr_rn_builtin_2.c > > diff --git a/gcc/config/rs6000/rs6000-builtins.def > b/gcc/config/rs6000/rs6000-builtins.def > index 289a37998b1..28788b69c7d 100644 > --- a/gcc/config/rs6000/rs6000-builtins.def > +++ b/gcc/config/rs6000/rs6000-builtins.def > @@ -237,7 +237,7 @@ > const __ibm128 __builtin_pack_ibm128 (double, double); > PACK_IF packif {ibm128} > > - void __builtin_set_fpscr_rn (const int[0,3]); > + double __builtin_set_fpscr_rn (const int[0,3]); > SET_FPSCR_RN rs6000_set_fpscr_rn {nosoft} > > const double __builtin_unpack_ibm128 (__ibm128, const int<1>); > diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc > index 8555174d36e..07a32badb54 100644 > --- a/gcc/config/rs6000/rs6000-c.cc > +++ b/gcc/config/rs6000/rs6000-c.cc > @@ -604,6 +604,11 @@ rs6000_target_modify_macros (bool define_p, > HOST_WIDE_INT flags) > /* Tell the user -mrop-protect is in play. */ > if (rs6000_rop_protect) > rs6000_define_or_undefine_macro (define_p, "__ROP_PROTECT__"); > + /* Tell the user __builtin_set_fpscr_rn now returns the FPSCR fields > + in a double. Originally the built-in returned void. */ > + if ((flags & OPTION_MASK_SOFT_FLOAT) == 0) > + rs6000_define_or_undefine_macro (define_p, > + "__SET_FPSCR_RN_RETURNS_FPSCR__"); > } > > void > diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md > index b0db8ae508d..1c1aab1af56 100644 > --- a/gcc/config/rs6000/rs6000.md > +++ b/gcc/config/rs6000/rs6000.md > @@ -6441,7 +6441,8 @@ > [(set_attr "type" "fp")]) > > (define_expand "rs6000_set_fpscr_rn" > - [(match_operand:SI 0 "reg_or_cint_operand")] > + [(use (match_operand:DF 0 "gpc_reg_operand")) > + (use (match_operand:SI 1 "reg_or_cint_operand"))] > "TARGET_HARD_FLOAT" > { > rtx tmp_df = gen_reg_rtx (DFmode); > @@ -6450,25 +6451,41 @@ > new rounding mode bits from operands[0][62:63] into FPSCR[62:63]. */ > if (TARGET_P9_MISC) > { > - if (const_0_to_3_operand (operands[0], VOIDmode)) > - emit_insn (gen_rs6000_mffscrni (tmp_df, operands[0])); > + if (const_0_to_3_operand (operands[1], VOIDmode)) > + emit_insn (gen_rs6000_mffscrni (tmp_df, operands[1])); > else > { > - rtx op0 = convert_to_mode (DImode, operands[0], false); > - rtx src_df = simplify_gen_subreg (DFmode, op0, DImode, 0); > + rtx op1 = convert_to_mode (DImode, operands[1], false); > + rtx src_df = simplify_gen_subreg (DFmode, op1, DImode, 0); > emit_insn (gen_rs6000_mffscrn (tmp_df, src_df)); > } > - DONE; > + emit_move_insn (operands[0], tmp_df); > + DONE; > } > > - if (CONST_INT_P (operands[0])) > + /* Emulate the behavior of the mffscrni, mffscrn instructions for earlier > + ISAs. Return bits 29:31 (DRN) and bits 56:63 (VE, OE, UE, ZE, XE, NI, > + RN) from the FPSCR. Set the RN field based on the value in operands[1]. > + */ > + > + /* Get the current FPSCR fields, bits 29:31 (DRN) and bits 56:63 (VE, OE, > UE, > + ZE, XE, NI, RN) from the FPSCR and return them. */ > + rtx tmp_di1 = gen_reg_rtx (DImode); Nit: This line is preferred to be move to below (a), close to its use. > + > + emit_insn (gen_rs6000_mffs (tmp_df)); > + rtx tmp_di2 = simplify_gen_subreg (DImode, tmp_df, DFmode, 0); Nit: May be good to rename this tmp_di2 as orig_df_in_di, hope it can offer better readablity when people read the code below with its use. ... (a) > + emit_insn (gen_anddi3 (tmp_di1, tmp_di2, GEN_INT (0x00000007000000FFULL))); > + rtx tmp_rtn = simplify_gen_subreg (DFmode, tmp_di1, DImode, 0); > + emit_move_insn (operands[0], tmp_rtn); > + > + if (CONST_INT_P (operands[1])) > { > - if ((INTVAL (operands[0]) & 0x1) == 0x1) > + if ((INTVAL (operands[1]) & 0x1) == 0x1) > emit_insn (gen_rs6000_mtfsb1 (GEN_INT (31))); > else > emit_insn (gen_rs6000_mtfsb0 (GEN_INT (31))); > > - if ((INTVAL (operands[0]) & 0x2) == 0x2) > + if ((INTVAL (operands[1]) & 0x2) == 0x2) > emit_insn (gen_rs6000_mtfsb1 (GEN_INT (30))); > else > emit_insn (gen_rs6000_mtfsb0 (GEN_INT (30))); > @@ -6476,23 +6493,20 @@ > else > { > rtx tmp_rn = gen_reg_rtx (DImode); > - rtx tmp_di = gen_reg_rtx (DImode); > > /* Extract new RN mode from operand. */ > - rtx op0 = convert_to_mode (DImode, operands[0], false); > - emit_insn (gen_anddi3 (tmp_rn, op0, GEN_INT (3))); > + rtx op1 = convert_to_mode (DImode, operands[1], false); > + emit_insn (gen_anddi3 (tmp_rn, op1, GEN_INT (3))); > > - /* Insert new RN mode into FSCPR. */ > - emit_insn (gen_rs6000_mffs (tmp_df)); > - tmp_di = simplify_gen_subreg (DImode, tmp_df, DFmode, 0); > - emit_insn (gen_anddi3 (tmp_di, tmp_di, GEN_INT (-4))); > - emit_insn (gen_iordi3 (tmp_di, tmp_di, tmp_rn)); > + /* Insert the new RN value from tmp_rn into FPSCR bit [62:63]. */ > + emit_insn (gen_anddi3 (tmp_di1, tmp_di2, GEN_INT (-4))); > + emit_insn (gen_iordi3 (tmp_di1, tmp_di1, tmp_rn)); > > /* Need to write to field k=15. The fields are [0:15]. Hence with > - L=0, W=0, FLM_i must be equal to 8, 16 = i + 8*(1-W). FLM is an > - 8-bit field[0:7]. Need to set the bit that corresponds to the > - value of i that you want [0:7]. */ > - tmp_df = simplify_gen_subreg (DFmode, tmp_di, DImode, 0); > + L=0, W=0, FLM_i must be equal to 8, 16 = i + 8*(1-W). FLM is an > + 8-bit field[0:7]. Need to set the bit that corresponds to the > + value of i that you want [0:7]. */ Nit: Wrong indent change for this comment. Previous tabs are expected. > + tmp_df = simplify_gen_subreg (DFmode, tmp_di1, DImode, 0); > emit_insn (gen_rs6000_mtfsf (GEN_INT (0x01), tmp_df)); > } > DONE; > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi > index cdbd4b34a35..8ff34c33d8e 100644 > --- a/gcc/doc/extend.texi > +++ b/gcc/doc/extend.texi > @@ -18188,7 +18188,7 @@ double __builtin_mffs (void); > void __builtin_mtfsf (const int, double); > void __builtin_mtfsb0 (const int); > void __builtin_mtfsb1 (const int); > -void __builtin_set_fpscr_rn (int); > +double __builtin_set_fpscr_rn (int); > @end smallexample > > The @code{__builtin_ppc_get_timebase} and @code{__builtin_ppc_mftb} > @@ -18209,13 +18209,20 @@ values to selected fields of the FPSCR. The > as an argument. The valid bit range is between 0 and 31. The builtins map > to > the @code{mtfsb0} and @code{mtfsb1} instructions which take the argument and > add 32. Hence these instructions only modify the FPSCR[32:63] bits by > -changing the specified bit to a zero or one respectively. The > -@code{__builtin_set_fpscr_rn} builtin allows changing both of the floating > -point rounding mode bits. The argument is a 2-bit value. The argument can > -either be a @code{const int} or stored in a variable. The builtin uses > -the ISA 3.0 > -instruction @code{mffscrn} if available, otherwise it reads the FPSCR, masks > -the current rounding mode bits out and OR's in the new value. > +changing the specified bit to a zero or one respectively. > + > +The @code{__builtin_set_fpscr_rn} built-in allows changing both of the > floating > +point rounding mode bits and returning the various FPSCR fields before the RN > +field is updated. The built-in returns a double consisting of the initial > +value of the FPSCR fields DRN, VE, OE, UE, ZE, XE, NI, and RN bit positions > +with all other bits set to zero. The built-in argument is a 2-bit value for > the > +new RN field value. The argument can either be an @code{const int} or stored > +in a variable. Earlier versions of @code{__builtin_set_fpscr_rn} returned > +void. A @code{__SET_FPSCR_RN_RETURNS_FPSCR__} macro has been added. If > +defined, then the @code{__builtin_set_fpscr_rn} built-in returns the FPSCR > +fields. If not defined, the @code{__builtin_set_fpscr_rn} does not return a > +vaule. If the @option{-msoft-float} option is used, the > +@code{__builtin_set_fpscr_rn} built-in will not return a value. > > @node Basic PowerPC Built-in Functions Available on ISA 2.05 > @subsubsection Basic PowerPC Built-in Functions Available on ISA 2.05 > diff --git a/gcc/testsuite/gcc.target/powerpc/test_fpscr_rn_builtin.c > b/gcc/testsuite/gcc.target/powerpc/test_fpscr_rn_builtin_1.c > similarity index 89% > rename from gcc/testsuite/gcc.target/powerpc/test_fpscr_rn_builtin.c > rename to gcc/testsuite/gcc.target/powerpc/test_fpscr_rn_builtin_1.c > index 04707ad8a56..3e22b4e511e 100644 > --- a/gcc/testsuite/gcc.target/powerpc/test_fpscr_rn_builtin.c > +++ b/gcc/testsuite/gcc.target/powerpc/test_fpscr_rn_builtin_1.c > @@ -1,6 +1,12 @@ > -/* { dg-do run { target { powerpc*-*-* } } } */ > +/* { dg-do run { target powerpc_fprs } } */ > /* { dg-options "-O2 -std=c99" } */ Nit: Since you remove -std=c99 from _2.c, pls. remove it here to keep consistent. > > +/* Originally __builtin_set_fpscr_rn was defined to return void. It was > + later extended to return a double with the various FPSCR bits. The > + extended built-in is intended to be a drop in replacement for the original > + version. This test is for the original version of the built-in and should > + work exactly as before. */ > + > #ifdef DEBUG > #include <stdio.h> > #endif > @@ -26,7 +32,7 @@ int main () > unsigned long long ll_value; > register double f14; > > - /* __builtin_set_fpscr_rn() builtin can take a const or a variable > + /* __builtin_set_fpscr_rn() built-in can take a const or a variable > value between 0 and 3 as the argument. > __builtin_mtfsb0 and __builtin_mtfsb1 argument must be a constant > 30 or 31. > diff --git a/gcc/testsuite/gcc.target/powerpc/test_fpscr_rn_builtin_2.c > b/gcc/testsuite/gcc.target/powerpc/test_fpscr_rn_builtin_2.c > new file mode 100644 > index 00000000000..861b4a8d300 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/test_fpscr_rn_builtin_2.c > @@ -0,0 +1,151 @@ > +/* { dg-do run { target powerpc_fprs } } */ > +/* { dg-options "-O2" } */ > + > +/* The __builtin_set_fpscr_rn built-in was originally defined to return > + void. The built-in now returns a double containing the various FPSCR > bits. > + This test verifies the new return value. */ > + > +#ifdef DEBUG > +#include <stdio.h> > +#endif > + > +#define RN_MASK 0x3LL /* RN field mask */ > +#define FIELD_MASK 0x00000007000000FFULL > + > +union blah { > + double d; > + unsigned long long ll; > +} conv_val; > + > +void abort (void); > + > +double __attribute__ ((noipa)) wrap_set_fpscr_rn (int val) > +{ > + return __builtin_set_fpscr_rn (val); > +} > + > +double __attribute__ ((noipa)) wrap_const_fpscr_rn (int val) > +{ > + switch (val) > + { > + case 0: return __builtin_set_fpscr_rn (0x0); > + case 1: return __builtin_set_fpscr_rn (0x1); > + case 2: return __builtin_set_fpscr_rn (0x2); > + case 3: return __builtin_set_fpscr_rn (0x3); > + } > +} > + > +void check_builtin_set_fpscr_rn (unsigned long long initial_fpscr, > + int new_RN, double result) > +{ > + register double f14; > + unsigned long long masked_fpscr = initial_fpscr & FIELD_MASK; > + > + conv_val.d = result; > + > + /* Check the result. */ > + if (conv_val.ll != masked_fpscr) > + { > +#ifdef DEBUG > + printf("ERROR, __builtin_set_fpscr_rn(%d) did not return expected > value %llx.\n", > + new_RN, masked_fpscr); > + printf("fpscr_val_initial = 0x%llx\n", initial_fpscr); > + printf("result = 0x%llx\n", conv_val.ll); > +#else > + abort(); > +#endif > + } > + > + /* Check to see if the RN field was updated. */ > + __asm __volatile ("mffs %0" : "=f"(f14)); > + conv_val.d = f14; > + > + if ((conv_val.ll & RN_MASK) != new_RN) > +#ifdef DEBUG > + { > + printf("ERROR, __builtin_set_fpscr_rn(%d) did not update RN to > %llx.\n", > + new_RN, new_RN); > + printf(" conv_val.ll = 0x%llx\n", conv_val.ll); > + } > +#else > + abort(); > +#endif > +} > + > +int > +main () > +{ > + int i; > + int val, bit; > + double fpscr_val; > + unsigned long long fpscr_val_initial; > + > + unsigned long long ll_value; > + union blah src_double; > + register double f14; > + > + /* If __SET_FPSCR_RN_RETURNS_FPSCR__ is defined, the > __builtin_set_fpscr_rn() > + builtin returns the FPSCR fields.*/ > + > + /* __builtin_set_fpscr_rn() builtin can take a constant or a variable > + value between 0 and 3 as the argument. > + __builtin_mtfsb0 and __builtin_mtfsb1 argument must be a constant > + 30 or 31. > + */ > + > + /* Test reading the FPSCR register */ > + __asm __volatile ("mffs %0" : "=f"(f14)); > + conv_val.d = f14; > + > + if (conv_val.d != __builtin_mffs()) > + { > +#ifdef DEBUG > + printf("ERROR, __builtin_mffs() returned 0x%llx, not the expected > value 0x%llx\n", > + __builtin_mffs(), conv_val.d); > +#else > + abort(); > +#endif > + } > + > + /* Test return value from __builtin_set_fpscr_rn. The FPSCR fields (DRN, > VE, > + OE, UE, ZE, XE, NI, RN) are returned and the RN field of FPSCR is > updated > + with the specified argument for the built-in. */ > + > + /* Check immediate argument cases */ > + __asm __volatile ("mffs %0" : "=f"(f14)); > + conv_val.d = f14; > + fpscr_val_initial = conv_val.ll; > + > + val = 0x0; > + fpscr_val = wrap_const_fpscr_rn (val); > + check_builtin_set_fpscr_rn (fpscr_val_initial, val, fpscr_val); > + > + __asm __volatile ("mffs %0" : "=f"(f14)); > + conv_val.d = f14; > + fpscr_val_initial = conv_val.ll; > + > + val = 0x3; > + fpscr_val = wrap_const_fpscr_rn (val); > + check_builtin_set_fpscr_rn (fpscr_val_initial, val, fpscr_val); > + > + /* Check int argument cases */ > + __asm __volatile ("mffs %0" : "=f"(f14)); > + conv_val.d = f14; > + fpscr_val_initial = conv_val.ll; > + > + val = 0x1; > + fpscr_val = wrap_set_fpscr_rn (val); > + check_builtin_set_fpscr_rn (fpscr_val_initial, val, fpscr_val); > + > + __asm __volatile ("mffs %0" : "=f"(f14)); > + conv_val.d = f14; > + fpscr_val_initial = conv_val.ll; > + > + val = 0x2; > + fpscr_val = wrap_set_fpscr_rn (val); > + check_builtin_set_fpscr_rn (fpscr_val_initial, val, fpscr_val); > + return 0; > + > + /* The __SET_FPSCR_RN_RETURNS_FPSCR__ should be defined. */ > + abort(); Nit: These two lines are useless (unreachable), please remove them. BR, Kewen