Hi Carl, Some more minor comments are inline below on top of Peter's insightful review comments.
on 2023/7/1 08:58, Carl Love wrote: > > GCC maintainers: > > Ver 2, Went back thru the requirements and emails. Not sure where I > came up with the requirement for an overloaded version with double > argument. Removed the overloaded version with the double argument. > Added the macro to announce if the __builtin_set_fpscr_rn returns a > void or a double with the FPSCR bits. Updated the documentation file. > Retested on Power 8 BE/LE, Power 9 BE/LE, Power 10 LE. Redid the test > file. Per request, the original test file functionality was not > changed. Just changed the name from test_fpscr_rn_builtin.c to > test_fpscr_rn_builtin_1.c. Put new tests for the return values into a > new test file, test_fpscr_rn_builtin_2.c. > > The GLibC team requested a builtin to replace the mffscrn and > mffscrniinline asm instructions in the GLibC code. Previously there > was discussion on adding builtins for the mffscrn instructions. > > https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620261.html > > In the end, it was felt that it would be to extend the existing > __builtin_set_fpscr_rn builtin to return a double instead of a void > type. The desire is that we could have the functionality of the > mffscrn and mffscrni instructions on older ISAs. The two instructions > were initially added in ISA 3.0. The __builtin_set_fpscr_rn has the > needed functionality to set the RN field using the mffscrn and mffscrni > instructions if ISA 3.0 is supported or fall back to using logical > instructions to mask and set the bits for earlier ISAs. The > instructions return the current value of the FPSCR fields DRN, VE, OE, > UE, ZE, XE, NI, RN bit positions then update the RN bit positions with > the new RN value provided. > > The current __builtin_set_fpscr_rn builtin has a return type of void. > So, changing the return type to double and returning the FPSCR fields > DRN, VE, OE, UE, ZE, XE, NI, RN bit positions would then give the > functionally equivalent of the mffscrn and mffscrni instructions. Any > current uses of the builtin would just ignore the return value yet any > new uses could use the return value. So the requirement is for the > change to the __builtin_set_fpscr_rn builtin to be backwardly > compatible and work for all ISAs. > > The following patch changes the return type of the > __builtin_set_fpscr_rn builtin from void to double. The return value > is the current value of the various FPSCR fields DRN, VE, OE, UE, ZE, > XE, NI, RN bit positions when the builtin is called. The builtin then > updated the RN field with the new value provided as an argument to the > builtin. The patch adds new testcases to test_fpscr_rn_builtin.c to > check that the builtin returns the current value of the FPSCR fields > and then updates the RN field. > > The GLibC team has reviewed the patch to make sure it met their needs > as a drop in replacement for the inline asm mffscr and mffscrni > statements in the GLibC code. T > > The patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power 10 > LE. > > Please let me know if the patch is acceptable for mainline. Thanks. > > Carl > > > ---------------------------------- > rs6000, __builtin_set_fpscr_rn add retrun value > > Change the return value from void to double. The return value consists of > the FPSCR fields DRN, VE, OE, UE, ZE, XE, NI, RN bit positions. Add an > overloaded version which accepts a double argument. > > The test powerpc/test_fpscr_rn_builtin.c is updated to add tests for the > double reterun value and the new double argument. > > gcc/ChangeLog: > * config/rs6000/rs6000-builtins.def (__builtin_set_fpscr_rn): Update > builtin definition return type. > * config/rs6000-c.cc(rs6000_target_modify_macros): Add check, define > __SET_FPSCR_RN_RETURNS_FPSCR__ macro. > * config/rs6000/rs6000.md ((rs6000_get_fpscr_fields): New > define_expand. > (rs6000_update_fpscr_rn_field): New define_expand. > (rs6000_set_fpscr_rn): Added return argument. Updated to use new > rs6000_get_fpscr_fields and rs6000_update_fpscr_rn_field define > _expands. > * doc/extend.texi (__builtin_set_fpscr_rn): Update description for > the return value and new double argument. Add descripton for > __SET_FPSCR_RN_RETURNS_FPSCR__ macro. > > gcc/testsuite/ChangeLog: > gcc.target/powerpc/test_fpscr_rn_builtin.c: Renamed to > test_fpscr_rn_builtin_1.c. Added comment. > gcc.target/powerpc/test_fpscr_rn_builtin_2.c: New test for the > return value of __builtin_set_fpscr_rn builtin. > --- > gcc/config/rs6000/rs6000-builtins.def | 2 +- > gcc/config/rs6000/rs6000-c.cc | 4 + > gcc/config/rs6000/rs6000.md | 87 +++++++--- > gcc/doc/extend.texi | 26 ++- > ...rn_builtin.c => test_fpscr_rn_builtin_1.c} | 6 + > .../powerpc/test_fpscr_rn_builtin_2.c | 153 ++++++++++++++++++ > 6 files changed, 246 insertions(+), 32 deletions(-) > rename gcc/testsuite/gcc.target/powerpc/{test_fpscr_rn_builtin.c => > test_fpscr_rn_builtin_1.c} (92%) > create mode 100644 gcc/testsuite/gcc.target/powerpc/test_fpscr_rn_builtin_2.c > > diff --git a/gcc/config/rs6000/rs6000-builtins.def > b/gcc/config/rs6000/rs6000-builtins.def > index 289a37998b1..28788b69c7d 100644 > --- a/gcc/config/rs6000/rs6000-builtins.def > +++ b/gcc/config/rs6000/rs6000-builtins.def > @@ -237,7 +237,7 @@ > const __ibm128 __builtin_pack_ibm128 (double, double); > PACK_IF packif {ibm128} > > - void __builtin_set_fpscr_rn (const int[0,3]); > + double __builtin_set_fpscr_rn (const int[0,3]); > SET_FPSCR_RN rs6000_set_fpscr_rn {nosoft} > > const double __builtin_unpack_ibm128 (__ibm128, const int<1>); > diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc > index 8555174d36e..8373bb66919 100644 > --- a/gcc/config/rs6000/rs6000-c.cc > +++ b/gcc/config/rs6000/rs6000-c.cc > @@ -604,6 +604,10 @@ rs6000_target_modify_macros (bool define_p, > HOST_WIDE_INT flags) > /* Tell the user -mrop-protect is in play. */ > if (rs6000_rop_protect) > rs6000_define_or_undefine_macro (define_p, "__ROP_PROTECT__"); > + /* Tell the user the __builtin_set_fpscr_rn now returns the FPSCR fields > + in a double. Originally the builtin returned void. */ > + if ((flags & OPTION_MASK_SOFT_FLOAT) == 0) > + rs6000_define_or_undefine_macro (define_p, > "__SET_FPSCR_RN_RETURNS_FPSCR__"); > } > > void > diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md > index b0db8ae508d..1b77a13c8a1 100644 > --- a/gcc/config/rs6000/rs6000.md > +++ b/gcc/config/rs6000/rs6000.md > @@ -6440,8 +6440,51 @@ > "mffscdrn %0,%1" > [(set_attr "type" "fp")]) > > + > +(define_expand "rs6000_get_fpscr_fields" > + [(match_operand:DF 0 "gpc_reg_operand")] > + "TARGET_HARD_FLOAT" > +{ > + /* Extract fields bits 29:31 (DRN) and bits 56:63 (VE, OE, UE, ZE, XE, NI, > + RN) from the FPSCR and return them. */ > + rtx tmp_df = gen_reg_rtx (DFmode); > + rtx tmp_di = gen_reg_rtx (DImode); > + > + emit_insn (gen_rs6000_mffs (tmp_df)); > + tmp_di = simplify_gen_subreg (DImode, tmp_df, DFmode, 0); > + emit_insn (gen_anddi3 (tmp_di, tmp_di, GEN_INT (0x00000007000000FFULL))); > + rtx tmp_rtn = simplify_gen_subreg (DFmode, tmp_di, DImode, 0); > + emit_move_insn (operands[0], tmp_rtn); > + DONE; > +}) > + > +(define_expand "rs6000_update_fpscr_rn_field" > + [(match_operand:DI 0 "gpc_reg_operand")] > + "TARGET_HARD_FLOAT" > +{ > + /* Insert the new RN value from operands[0] into FPSCR bit [62:63]. */ > + rtx tmp_di = gen_reg_rtx (DImode); > + rtx tmp_df = gen_reg_rtx (DFmode); > + > + emit_insn (gen_rs6000_mffs (tmp_df)); > + tmp_di = simplify_gen_subreg (DImode, tmp_df, DFmode, 0); > + emit_insn (gen_anddi3 (tmp_di, tmp_di, GEN_INT (-4))); > + emit_insn (gen_iordi3 (tmp_di, tmp_di, operands[0])); > + > + /* Need to write to field k=15. The fields are [0:15]. Hence with > + L=0, W=0, FLM_i must be equal to 8, 16 = i + 8*(1-W). FLM is an > + 8-bit field[0:7]. Need to set the bit that corresponds to the > + value of i that you want [0:7]. */ > + > + tmp_df = simplify_gen_subreg (DFmode, tmp_di, DImode, 0); > + emit_insn (gen_rs6000_mtfsf (GEN_INT (0x01), tmp_df)); > + DONE; > +}) > + > (define_expand "rs6000_set_fpscr_rn" > - [(match_operand:SI 0 "reg_or_cint_operand")] > + [(set (match_operand:DF 0 "gpc_reg_operand") > + (unspec_volatile:DF [(match_operand:SI 1 "reg_or_cint_operand")] > + UNSPECV_MFFSCDRN))] I don't think we need this pattern to be complete, since it's always expanded with "DONE", the pattern is useless, so it can be simpler like: [(use (match_operand:DF 0 "gpc_reg_operand")) (use (match_operand:SI 1 "reg_or_cint_operand"))] > "TARGET_HARD_FLOAT" > { > rtx tmp_df = gen_reg_rtx (DFmode); > @@ -6450,25 +6493,34 @@ > new rounding mode bits from operands[0][62:63] into FPSCR[62:63]. */ > if (TARGET_P9_MISC) > { > - if (const_0_to_3_operand (operands[0], VOIDmode)) > - emit_insn (gen_rs6000_mffscrni (tmp_df, operands[0])); > + if (const_0_to_3_operand (operands[1], VOIDmode)) > + emit_insn (gen_rs6000_mffscrni (tmp_df, operands[1])); > else > { > - rtx op0 = convert_to_mode (DImode, operands[0], false); > - rtx src_df = simplify_gen_subreg (DFmode, op0, DImode, 0); > + rtx op1 = convert_to_mode (DImode, operands[1], false); > + rtx src_df = simplify_gen_subreg (DFmode, op1, DImode, 0); > emit_insn (gen_rs6000_mffscrn (tmp_df, src_df)); > } > - DONE; > + emit_move_insn (operands[0], tmp_df); > + DONE; > } > > - if (CONST_INT_P (operands[0])) > + /* Emulate the behavior of the mffscrni, mffscrn instructions for earlier > + ISAs. Return bits 29:31 (DRN) and bits 56:63 (VE, OE, UE, ZE, XE, NI, > + RN) from the FPSCR. Set the RN field based on the value in operands[1]. > + */ > + > + /* Get the current FPSCR fields to return. */ > + emit_insn (gen_rs6000_get_fpscr_fields (operands[0])); IMHO the above define_expand rs6000_get_fpscr_fields isn't needed, just inline it here, since it doesn't get reused somewhere else, and more importantly the read fpscr in DImode (tmp_di from simplify_gen_subreg on tmp_df) can be shared with the uses in rs6000_update_fpscr_rn_field (so I'd also suggest not factoring out the previous RN bit handlings into rs6000_update_fpscr_rn_field). > + > + if (CONST_INT_P (operands[1])) > { > - if ((INTVAL (operands[0]) & 0x1) == 0x1) > + if ((INTVAL (operands[1]) & 0x1) == 0x1) > emit_insn (gen_rs6000_mtfsb1 (GEN_INT (31))); > else > emit_insn (gen_rs6000_mtfsb0 (GEN_INT (31))); > > - if ((INTVAL (operands[0]) & 0x2) == 0x2) > + if ((INTVAL (operands[1]) & 0x2) == 0x2) > emit_insn (gen_rs6000_mtfsb1 (GEN_INT (30))); > else > emit_insn (gen_rs6000_mtfsb0 (GEN_INT (30))); > @@ -6476,24 +6528,13 @@ > else > { > rtx tmp_rn = gen_reg_rtx (DImode); > - rtx tmp_di = gen_reg_rtx (DImode); > > /* Extract new RN mode from operand. */ > - rtx op0 = convert_to_mode (DImode, operands[0], false); > - emit_insn (gen_anddi3 (tmp_rn, op0, GEN_INT (3))); > + rtx op1 = convert_to_mode (DImode, operands[1], false); > + emit_insn (gen_anddi3 (tmp_rn, op1, GEN_INT (3))); > > /* Insert new RN mode into FSCPR. */ > - emit_insn (gen_rs6000_mffs (tmp_df)); > - tmp_di = simplify_gen_subreg (DImode, tmp_df, DFmode, 0); > - emit_insn (gen_anddi3 (tmp_di, tmp_di, GEN_INT (-4))); > - emit_insn (gen_iordi3 (tmp_di, tmp_di, tmp_rn)); > - > - /* Need to write to field k=15. The fields are [0:15]. Hence with > - L=0, W=0, FLM_i must be equal to 8, 16 = i + 8*(1-W). FLM is an > - 8-bit field[0:7]. Need to set the bit that corresponds to the > - value of i that you want [0:7]. */ > - tmp_df = simplify_gen_subreg (DFmode, tmp_di, DImode, 0); > - emit_insn (gen_rs6000_mtfsf (GEN_INT (0x01), tmp_df)); > + emit_insn (gen_rs6000_update_fpscr_rn_field (tmp_rn)); > } > DONE; > }) > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi > index cdbd4b34a35..fee35ac40ec 100644 > --- a/gcc/doc/extend.texi > +++ b/gcc/doc/extend.texi > @@ -18188,7 +18188,6 @@ double __builtin_mffs (void); > void __builtin_mtfsf (const int, double); > void __builtin_mtfsb0 (const int); > void __builtin_mtfsb1 (const int); > -void __builtin_set_fpscr_rn (int); Just update void to double here, instead of the below seperated smallexample? As the below documentation is to describe these above functions one by one, just moving one down looks a bit inconsistent. > @end smallexample > > The @code{__builtin_ppc_get_timebase} and @code{__builtin_ppc_mftb} > @@ -18209,13 +18208,24 @@ values to selected fields of the FPSCR. The > as an argument. The valid bit range is between 0 and 31. The builtins map > to > the @code{mtfsb0} and @code{mtfsb1} instructions which take the argument and > add 32. Hence these instructions only modify the FPSCR[32:63] bits by > -changing the specified bit to a zero or one respectively. The > -@code{__builtin_set_fpscr_rn} builtin allows changing both of the floating > -point rounding mode bits. The argument is a 2-bit value. The argument can > -either be a @code{const int} or stored in a variable. The builtin uses > -the ISA 3.0 > -instruction @code{mffscrn} if available, otherwise it reads the FPSCR, masks > -the current rounding mode bits out and OR's in the new value. > +changing the specified bit to a zero or one respectively. > + > +@smallexample > +double __builtin_set_fpscr_rn (int); > +@end smallexample > + > +The @code{__builtin_set_fpscr_rn} builtin allows changing both of the > floating > +point rounding mode bits and returning the various FPSCR fields before the RN > +field is updated. The builtin returns a double consisting of the initial > value > +of the FPSCR fields DRN, VE, OE, UE, ZE, XE, NI, and RN bit positions with > all > +other bits set to zero. The builtin argument is a 2-bit value for the new RN > +field value. The argument can either be an @code{const int} or stored in a > +variable. Earlier versions of @code{__builtin_set_fpscr_rn} returned void. > A > +@code{__SET_FPSCR_RN_RETURNS_FPSCR__} macro has been added. If defined, then > +the @code{__builtin_set_fpscr_rn} builtin returns the FPSCR fields. If not > +defined, the @code{__builtin_set_fpscr_rn} does not return a vaule. If the > +@option{-msoft-float} option is used, the @code{__builtin_set_fpscr_rn} > builtin > +will not return a value. __builtin_set_fpscr_rn isn't available for -msoft-float, your patch doesn't change it so IMHO you don't need to note this. > > @node Basic PowerPC Built-in Functions Available on ISA 2.05 > @subsubsection Basic PowerPC Built-in Functions Available on ISA 2.05 > diff --git a/gcc/testsuite/gcc.target/powerpc/test_fpscr_rn_builtin.c > b/gcc/testsuite/gcc.target/powerpc/test_fpscr_rn_builtin_1.c > similarity index 92% > rename from gcc/testsuite/gcc.target/powerpc/test_fpscr_rn_builtin.c > rename to gcc/testsuite/gcc.target/powerpc/test_fpscr_rn_builtin_1.c > index 04707ad8a56..d917ee19656 100644 > --- a/gcc/testsuite/gcc.target/powerpc/test_fpscr_rn_builtin.c > +++ b/gcc/testsuite/gcc.target/powerpc/test_fpscr_rn_builtin_1.c > @@ -1,6 +1,12 @@ > /* { dg-do run { target { powerpc*-*-* } } } */ > /* { dg-options "-O2 -std=c99" } */ > > +/* Originally the __builtin_set_fpscr_rn builtin was defined to return > + void. It was later extended to return a double with the various > + FPSCR bits. The extended builtin is inteded to be a drop in replacement > + for the original version. This test is for the original version of the > + builtin and should work exactly as before. */ > + > #ifdef DEBUG > #include <stdio.h> > #endif > diff --git a/gcc/testsuite/gcc.target/powerpc/test_fpscr_rn_builtin_2.c > b/gcc/testsuite/gcc.target/powerpc/test_fpscr_rn_builtin_2.c > new file mode 100644 > index 00000000000..62fca67c948 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/test_fpscr_rn_builtin_2.c > @@ -0,0 +1,153 @@ > +/* { dg-do run { target { powerpc*-*-* } } } */ > +/* { dg-options "-O2 -std=c99" } */ Nit: The previous existing test_fpscr_rn_builtin.c introduced -std=c99 but I think it is useless for this case, so you can remove it if you like. > + > +/* The __builtin_set_fpscr_rn builtin was originally defined to return > + void. It was to return a double with the various FPSCR bits. This test > + verifies the new version returns the desired FPSCR bits. */ > + > +#ifdef DEBUG > +#include <stdio.h> > +#endif > + > +#define RN_MASK 0x3LL /* RN field mask */ > +#define FIELD_MASK 0x00000007000000FFULL > + > +union blah { > + double d; > + unsigned long long ll; > +} conv_val; > + > +void abort (void); > +double __attribute__ ((noipa)) wrap_set_fpscr_rn (int val) > +{ > + return __builtin_set_fpscr_rn (val); > +} > + Nit: I noticed you want this case to abort if __SET_FPSCR_RN_RETURNS_FPSCR__ doesn't get defined, but actually the compilation should fail for compiling this function wrap_set_fpscr_rn already, so the below __SET_FPSCR_RN_RETURNS_FPSCR__ check and abort seems useless. That said, the explicit use of return value of this built-in __builtin_set_fpscr_rn already implicitly checks it. BR, Kewen