[PATCH 03/11] c: c++: Define new floating point builtin fetch_add functions

mmalcomson Thu, 14 Nov 2024 06:08:54 -0800

From: Matthew Malcomson <mmalcom...@nvidia.com>

Not sure who to Cc for this.  Honestly just guessing a bit here.  Please
do redirect me if anyone knows of a better set of people to ask.


-------------- >8 ------- 8< -----------
This commit just defines the new names -- as yet don't implement them.
Saving this commit because this is one decision, and recording
what the decision was and why:

Adding new floating point builtins for each floating point type that
is defined in the general code *except* f128x (which would have a size
greater than 16bytes -- the largest integral atomic operation we
currently support).

We have to base our naming on floating point *types* rather than sizes
since different types can have the same size and the operations need
to be distinguished based on type.  N.b. one could make size-suffixed
builtins that are still overloaded based on types but I thought that
this was the cleaner approach.
(Actual requirement is distinction based on mode, this is how I choose
which internal function to use in a later patch.  I believe that
defining the function in terms of types and internally mapping to modes
is a sensible split between user interface and internal implementation).

Have checked with clang developers that they're happy with those names.
https://discourse.llvm.org/t/atomic-floating-point-operations-and-libstdc/81461

N.b. in order to choose whether these operations are available or not
in something like libstdc++ we use SFINAE on the type.  This is already
available in clang the below link has the patch where I add this ability
into GCC:
https://gcc.gnu.org/pipermail/gcc-patches/2024-October/664999.html

N.b. I used the below type suffixes (following what seems like the
existing convention for builtins):
  - float  -> f
  - double -> <no suffix>
  - long double -> l
  - _FloatN -> fN   (for N <- (16, 32, 64, 128))
  - _FloatNx -> fNx (for N <- (32, 64))

Richi suggested doing this expansion generally for all these builtins
following Cxy _Atomic semantics on IRC.
Since C hasn't specified any fetch_<op> semantics for floating point
types, C++ has only specified `atomic<>::fetch_{add,sub}`, and the
operations other than these are all bitwise operations (which don't
to map well to floating point), I believe I have followed that
suggestion by implementing all fetch_{sub,add}/{add,sub}_fetch
operations.

I have not implemented anything for the __sync_* builtins on the
belief that these are legacy and new code should use the __atomic_*
builtins.  Happy to adjust if that is a bad choice.

Only the new function types were needed for most cases.
The Fortran frontend does not use `builtin-types.def` but does include
`sync-builtins.def`.  Since the new definitions in `sync-builtins.def`
use new enums describing their type the fortran `types.def` file needed
to be updated to avoid a build error.
Some of the floating point types that these new functions use may not be
available depending on the target.
`builtin-types.def` defines the associated type enum to
`error_mark_node` when unavailable.  This can not be done with
`types.def` since in the fortran frontend there is no handling of
`error_mark_node` being the type defined with these macros.  Similarly
the fortran frontend can not handle functions defined in
`sync-builtins.def` with `error_mark_node` as a type (while other
frontends can).

The new functions will not automatically be exposed in fortran by simply
defining them.  If/when they are exposed they will have to be exposed
with knowledge of the floating point semantics of Fortran in order to
correctly handle floating point exceptions when these builtins are
expanded as a CAS loop.  I.e. the current definitions in
`sync-builtins.def` are essentially dead code from the gfortran users
perspective.

Since there is no functionality to maintain here, I have introduced a
new macro DEF_DUMMY_FUNCTION_TYPE in the fortran types.def which defines
a type to match BT_FN_VOID_INT.  That is used as the type of each of the
new specialist floating point atomic functions.

gcc/ChangeLog:

        * builtin-types.def (BT_FN_FLOAT_VPTR_FLOAT_INT): New type.
        (BT_FN_DOUBLE_VPTR_DOUBLE_INT): New type.
        (BT_FN_LONGDOUBLE_VPTR_LONGDOUBLE_INT): New type.
        (BT_FN_BFLOAT16_VPTR_BFLOAT16_INT): New type.
        (BT_FN_FLOAT16_VPTR_FLOAT16_INT): New type.
        (BT_FN_FLOAT32_VPTR_FLOAT32_INT): New type.
        (BT_FN_FLOAT64_VPTR_FLOAT64_INT): New type.
        (BT_FN_FLOAT128_VPTR_FLOAT128_INT): New type.
        (BT_FN_FLOAT32X_VPTR_FLOAT32X_INT): New type.
        (BT_FN_FLOAT64X_VPTR_FLOAT64X_INT): New type.
        * sync-builtins.def (DEF_SYNC_FLOATN_NX_BUILTINS): New.
        (DEF_SYNC_FLOAT_BUILTINS): New.
        (ADD_FETCH_TYPE): New.
        (BUILT_IN_ATOMIC_ADD_FETCH): New.
        (SUB_FETCH_TYPE): New.
        (BUILT_IN_ATOMIC_SUB_FETCH): New.
        (FETCH_ADD_TYPE): New.
        (BUILT_IN_ATOMIC_FETCH_ADD): New.
        (FETCH_SUB_TYPE): New.
        (BUILT_IN_ATOMIC_FETCH_SUB): New.

gcc/fortran/ChangeLog:

        * f95-lang.cc (DEF_DUMMY_FUNCTION_TYPE): New macro.
        * types.def (BT_FLOAT): New dummy type.
        (BT_DOUBLE): New dummy type.
        (BT_LONGDOUBLE): New dummy type.
        (BT_FN_FLOAT_VPTR_FLOAT_INT): New dummy type.
        (BT_FN_DOUBLE_VPTR_DOUBLE_INT): New dummy type.
        (BT_FN_LONGDOUBLE_VPTR_LONGDOUBLE_INT): New dummy type.
        (BT_FN_BFLOAT16_VPTR_BFLOAT16_INT): New dummy type.
        (BT_FN_FLOAT16_VPTR_FLOAT16_INT): New dummy type.
        (BT_FN_FLOAT32_VPTR_FLOAT32_INT): New dummy type.
        (BT_FN_FLOAT64_VPTR_FLOAT64_INT): New dummy type.
        (BT_FN_FLOAT128_VPTR_FLOAT128_INT): New dummy type.
        (BT_FN_FLOAT32X_VPTR_FLOAT32X_INT): New dummy type.
        (BT_FN_FLOAT64X_VPTR_FLOAT64X_INT): New dummy type.

Signed-off-by: Matthew Malcomson <mmalcom...@nvidia.com>
---
 gcc/builtin-types.def   | 20 ++++++++++++++++++++
 gcc/fortran/f95-lang.cc |  5 +++++
 gcc/fortran/types.def   | 17 +++++++++++++++++
 gcc/sync-builtins.def   | 40 ++++++++++++++++++++++++++++++++++++++++
 4 files changed, 82 insertions(+)

diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def
index 25da582ce58..d0aac6a3e7c 100644
--- a/gcc/builtin-types.def
+++ b/gcc/builtin-types.def
@@ -802,6 +802,26 @@ DEF_FUNCTION_TYPE_3 (BT_FN_VOID_VPTR_I2_INT, BT_VOID, 
BT_VOLATILE_PTR, BT_I2, BT
 DEF_FUNCTION_TYPE_3 (BT_FN_VOID_VPTR_I4_INT, BT_VOID, BT_VOLATILE_PTR, BT_I4, 
BT_INT)
 DEF_FUNCTION_TYPE_3 (BT_FN_VOID_VPTR_I8_INT, BT_VOID, BT_VOLATILE_PTR, BT_I8, 
BT_INT)
 DEF_FUNCTION_TYPE_3 (BT_FN_VOID_VPTR_I16_INT, BT_VOID, BT_VOLATILE_PTR, 
BT_I16, BT_INT)
+DEF_FUNCTION_TYPE_3 (BT_FN_FLOAT_VPTR_FLOAT_INT, BT_FLOAT, BT_VOLATILE_PTR,
+                    BT_FLOAT, BT_INT)
+DEF_FUNCTION_TYPE_3 (BT_FN_DOUBLE_VPTR_DOUBLE_INT, BT_DOUBLE, BT_VOLATILE_PTR,
+                    BT_DOUBLE, BT_INT)
+DEF_FUNCTION_TYPE_3 (BT_FN_LONGDOUBLE_VPTR_LONGDOUBLE_INT, BT_LONGDOUBLE,
+                    BT_VOLATILE_PTR, BT_LONGDOUBLE, BT_INT)
+DEF_FUNCTION_TYPE_3 (BT_FN_BFLOAT16_VPTR_BFLOAT16_INT, BT_BFLOAT16, 
BT_VOLATILE_PTR,
+                    BT_BFLOAT16, BT_INT)
+DEF_FUNCTION_TYPE_3 (BT_FN_FLOAT16_VPTR_FLOAT16_INT, BT_FLOAT16, 
BT_VOLATILE_PTR,
+                    BT_FLOAT16, BT_INT)
+DEF_FUNCTION_TYPE_3 (BT_FN_FLOAT32_VPTR_FLOAT32_INT, BT_FLOAT32, 
BT_VOLATILE_PTR,
+                    BT_FLOAT32, BT_INT)
+DEF_FUNCTION_TYPE_3 (BT_FN_FLOAT64_VPTR_FLOAT64_INT, BT_FLOAT64, 
BT_VOLATILE_PTR,
+                    BT_FLOAT64, BT_INT)
+DEF_FUNCTION_TYPE_3 (BT_FN_FLOAT128_VPTR_FLOAT128_INT, BT_FLOAT128, 
BT_VOLATILE_PTR,
+                    BT_FLOAT128, BT_INT)
+DEF_FUNCTION_TYPE_3 (BT_FN_FLOAT32X_VPTR_FLOAT32X_INT, BT_FLOAT32X, 
BT_VOLATILE_PTR,
+                    BT_FLOAT32X, BT_INT)
+DEF_FUNCTION_TYPE_3 (BT_FN_FLOAT64X_VPTR_FLOAT64X_INT, BT_FLOAT64X, 
BT_VOLATILE_PTR,
+                    BT_FLOAT64X, BT_INT)
 DEF_FUNCTION_TYPE_3 (BT_FN_INT_PTRPTR_SIZE_SIZE, BT_INT, BT_PTR_PTR, BT_SIZE, 
BT_SIZE)
 DEF_FUNCTION_TYPE_3 (BT_FN_PTR_CONST_PTR_CONST_PTR_SIZE, BT_PTR, BT_CONST_PTR, 
BT_CONST_PTR, BT_SIZE)
 DEF_FUNCTION_TYPE_3 (BT_FN_BOOL_INT_INT_INTPTR, BT_BOOL, BT_INT, BT_INT,
diff --git a/gcc/fortran/f95-lang.cc b/gcc/fortran/f95-lang.cc
index 30043cf2f92..55df698bfae 100644
--- a/gcc/fortran/f95-lang.cc
+++ b/gcc/fortran/f95-lang.cc
@@ -648,6 +648,7 @@ gfc_init_builtin_functions (void)
   enum builtin_type
   {
 #define DEF_PRIMITIVE_TYPE(NAME, VALUE) NAME,
+#define DEF_DUMMY_FUNCTION_TYPE(NAME) NAME,
 #define DEF_FUNCTION_TYPE_0(NAME, RETURN) NAME,
 #define DEF_FUNCTION_TYPE_1(NAME, RETURN, ARG1) NAME,
 #define DEF_FUNCTION_TYPE_2(NAME, RETURN, ARG1, ARG2) NAME,
@@ -676,6 +677,7 @@ gfc_init_builtin_functions (void)
 #define DEF_POINTER_TYPE(NAME, TYPE) NAME,
 #include "types.def"
 #undef DEF_PRIMITIVE_TYPE
+#undef DEF_DUMMY_FUNCTION_TYPE
 #undef DEF_FUNCTION_TYPE_0
 #undef DEF_FUNCTION_TYPE_1
 #undef DEF_FUNCTION_TYPE_2
@@ -1068,6 +1070,8 @@ gfc_init_builtin_functions (void)
 
 #define DEF_PRIMITIVE_TYPE(ENUM, VALUE) \
   builtin_types[(int) ENUM] = VALUE;
+#define DEF_DUMMY_FUNCTION_TYPE(ENUM)                                          
\
+  builtin_types[(int) ENUM] = builtin_types[(int) BT_FN_VOID_INT];
 #define DEF_FUNCTION_TYPE_0(ENUM, RETURN)                       \
   builtin_types[(int) ENUM]                                     \
     = build_function_type_list (builtin_types[(int) RETURN],   \
@@ -1231,6 +1235,7 @@ gfc_init_builtin_functions (void)
     = build_pointer_type (builtin_types[(int) TYPE]);
 #include "types.def"
 #undef DEF_PRIMITIVE_TYPE
+#undef DEF_DUMMY_FUNCTION_TYPE
 #undef DEF_FUNCTION_TYPE_0
 #undef DEF_FUNCTION_TYPE_1
 #undef DEF_FUNCTION_TYPE_2
diff --git a/gcc/fortran/types.def b/gcc/fortran/types.def
index a69e25206f1..5e622a41040 100644
--- a/gcc/fortran/types.def
+++ b/gcc/fortran/types.def
@@ -59,6 +59,10 @@ DEF_PRIMITIVE_TYPE (BT_I4, builtin_type_for_size 
(BITS_PER_UNIT*4, 1))
 DEF_PRIMITIVE_TYPE (BT_I8, builtin_type_for_size (BITS_PER_UNIT*8, 1))
 DEF_PRIMITIVE_TYPE (BT_I16, builtin_type_for_size (BITS_PER_UNIT*16, 1))
 
+DEF_PRIMITIVE_TYPE (BT_FLOAT, float_type_node)
+DEF_PRIMITIVE_TYPE (BT_DOUBLE, double_type_node)
+DEF_PRIMITIVE_TYPE (BT_LONGDOUBLE, long_double_type_node)
+
 DEF_PRIMITIVE_TYPE (BT_PTR, ptr_type_node)
 DEF_PRIMITIVE_TYPE (BT_CONST_PTR, const_ptr_type_node)
 DEF_PRIMITIVE_TYPE (BT_VOLATILE_PTR,
@@ -143,6 +147,19 @@ DEF_FUNCTION_TYPE_3 (BT_FN_I2_VPTR_I2_INT, BT_I2, 
BT_VOLATILE_PTR, BT_I2, BT_INT
 DEF_FUNCTION_TYPE_3 (BT_FN_I4_VPTR_I4_INT, BT_I4, BT_VOLATILE_PTR, BT_I4, 
BT_INT)
 DEF_FUNCTION_TYPE_3 (BT_FN_I8_VPTR_I8_INT, BT_I8, BT_VOLATILE_PTR, BT_I8, 
BT_INT)
 DEF_FUNCTION_TYPE_3 (BT_FN_I16_VPTR_I16_INT, BT_I16, BT_VOLATILE_PTR, BT_I16, 
BT_INT)
+DEF_FUNCTION_TYPE_3 (BT_FN_FLOAT_VPTR_FLOAT_INT, BT_FLOAT, BT_VOLATILE_PTR,
+                    BT_FLOAT, BT_INT)
+DEF_FUNCTION_TYPE_3 (BT_FN_DOUBLE_VPTR_DOUBLE_INT, BT_DOUBLE, BT_VOLATILE_PTR,
+                    BT_DOUBLE, BT_INT)
+DEF_FUNCTION_TYPE_3 (BT_FN_LONGDOUBLE_VPTR_LONGDOUBLE_INT, BT_LONGDOUBLE,
+                    BT_VOLATILE_PTR, BT_LONGDOUBLE, BT_INT)
+DEF_DUMMY_FUNCTION_TYPE (BT_FN_BFLOAT16_VPTR_BFLOAT16_INT)
+DEF_DUMMY_FUNCTION_TYPE (BT_FN_FLOAT16_VPTR_FLOAT16_INT)
+DEF_DUMMY_FUNCTION_TYPE (BT_FN_FLOAT32_VPTR_FLOAT32_INT)
+DEF_DUMMY_FUNCTION_TYPE (BT_FN_FLOAT64_VPTR_FLOAT64_INT)
+DEF_DUMMY_FUNCTION_TYPE (BT_FN_FLOAT128_VPTR_FLOAT128_INT)
+DEF_DUMMY_FUNCTION_TYPE (BT_FN_FLOAT32X_VPTR_FLOAT32X_INT)
+DEF_DUMMY_FUNCTION_TYPE (BT_FN_FLOAT64X_VPTR_FLOAT64X_INT)
 DEF_FUNCTION_TYPE_3 (BT_FN_VOID_VPTR_I1_INT, BT_VOID, BT_VOLATILE_PTR, BT_I1, 
BT_INT)
 DEF_FUNCTION_TYPE_3 (BT_FN_VOID_VPTR_I2_INT, BT_VOID, BT_VOLATILE_PTR, BT_I2, 
BT_INT)
 DEF_FUNCTION_TYPE_3 (BT_FN_VOID_VPTR_I4_INT, BT_VOID, BT_VOLATILE_PTR, BT_I4, 
BT_INT)
diff --git a/gcc/sync-builtins.def b/gcc/sync-builtins.def
index b4ec3782799..89cc564a8f6 100644
--- a/gcc/sync-builtins.def
+++ b/gcc/sync-builtins.def
@@ -28,6 +28,30 @@ along with GCC; see the file COPYING3.  If not see
    is supposed to be using.  It's overloaded, and is resolved to one of the
    "_1" through "_16" versions, plus some extra casts.  */
 
+
+/* Same as DEF_GCC_FLOATN_NX_BUILTINS, except for sync builtins.
+   N.b. we do not define the f128x type because this would be larger than the
+   16 byte integral types that we have atomic support for.  That would mean
+   we couldn't implement them without adding special extra handling --
+   especially because to act atomically on such large sizes all architectures
+   would require locking implementations added in libatomic.  */
+#undef DEF_SYNC_FLOATN_NX_BUILTINS
+#define DEF_SYNC_FLOATN_NX_BUILTINS(ENUM, NAME, TYPE_MACRO, ATTRS)     \
+  DEF_SYNC_BUILTIN (ENUM ## F16, NAME "f16", TYPE_MACRO (FLOAT16), ATTRS) \
+  DEF_SYNC_BUILTIN (ENUM ## F32, NAME "f32", TYPE_MACRO (FLOAT32), ATTRS) \
+  DEF_SYNC_BUILTIN (ENUM ## F64, NAME "f64", TYPE_MACRO (FLOAT64), ATTRS) \
+  DEF_SYNC_BUILTIN (ENUM ## F128, NAME "f128", TYPE_MACRO (FLOAT128), ATTRS) \
+  DEF_SYNC_BUILTIN (ENUM ## F32X, NAME "f32x", TYPE_MACRO (FLOAT32X), ATTRS) \
+  DEF_SYNC_BUILTIN (ENUM ## F64X, NAME "f64x", TYPE_MACRO (FLOAT64X), ATTRS)
+
+#undef DEF_SYNC_FLOAT_BUILTINS
+#define DEF_SYNC_FLOAT_BUILTINS(ENUM, NAME, TYPE_MACRO, ATTRS) \
+  DEF_SYNC_BUILTIN (ENUM ## _FPF, NAME "_fpf", TYPE_MACRO (FLOAT), ATTRS) \
+  DEF_SYNC_BUILTIN (ENUM ## _FP, NAME "_fp", TYPE_MACRO (DOUBLE), ATTRS) \
+  DEF_SYNC_BUILTIN (ENUM ## _FPL, NAME "_fpl", TYPE_MACRO (LONGDOUBLE), ATTRS) 
\
+  DEF_SYNC_BUILTIN (ENUM ## _FPF16B, NAME "_fpf16b", TYPE_MACRO (BFLOAT16), 
ATTRS) \
+  DEF_SYNC_FLOATN_NX_BUILTINS (ENUM ## _FP, NAME "_fp", TYPE_MACRO, ATTRS)
+
 DEF_SYNC_BUILTIN (BUILT_IN_SYNC_FETCH_AND_ADD_N, "__sync_fetch_and_add",
                  BT_FN_VOID_VAR, ATTR_NOTHROWCALL_LEAF_LIST)
 DEF_SYNC_BUILTIN (BUILT_IN_SYNC_FETCH_AND_ADD_1, "__sync_fetch_and_add_1",
@@ -378,6 +402,10 @@ DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_ADD_FETCH_8,
 DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_ADD_FETCH_16,
                  "__atomic_add_fetch_16",
                  BT_FN_I16_VPTR_I16_INT, ATTR_NOTHROWCALL_LEAF_LIST)
+#define ADD_FETCH_TYPE(F) BT_FN_##F##_VPTR_##F##_INT
+DEF_SYNC_FLOAT_BUILTINS (BUILT_IN_ATOMIC_ADD_FETCH, "__atomic_add_fetch",
+                         ADD_FETCH_TYPE, ATTR_NOTHROWCALL_LEAF_LIST)
+#undef ADD_FETCH_TYPE
 
 DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_SUB_FETCH_N,
                  "__atomic_sub_fetch",
@@ -397,6 +425,10 @@ DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_SUB_FETCH_8,
 DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_SUB_FETCH_16,
                  "__atomic_sub_fetch_16",
                  BT_FN_I16_VPTR_I16_INT, ATTR_NOTHROWCALL_LEAF_LIST)
+#define SUB_FETCH_TYPE(F) BT_FN_##F##_VPTR_##F##_INT
+DEF_SYNC_FLOAT_BUILTINS (BUILT_IN_ATOMIC_SUB_FETCH, "__atomic_sub_fetch",
+                         SUB_FETCH_TYPE, ATTR_NOTHROWCALL_LEAF_LIST)
+#undef SUB_FETCH_TYPE
 
 DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_AND_FETCH_N,
                  "__atomic_and_fetch",
@@ -492,6 +524,10 @@ DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_FETCH_ADD_8,
 DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_FETCH_ADD_16,
                  "__atomic_fetch_add_16",
                  BT_FN_I16_VPTR_I16_INT, ATTR_NOTHROWCALL_LEAF_LIST)
+#define FETCH_ADD_TYPE(F) BT_FN_##F##_VPTR_##F##_INT
+DEF_SYNC_FLOAT_BUILTINS (BUILT_IN_ATOMIC_FETCH_ADD, "__atomic_fetch_add",
+                         FETCH_ADD_TYPE, ATTR_NOTHROWCALL_LEAF_LIST)
+#undef FETCH_ADD_TYPE
 
 DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_FETCH_SUB_N,
                  "__atomic_fetch_sub",
@@ -511,6 +547,10 @@ DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_FETCH_SUB_8,
 DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_FETCH_SUB_16,
                  "__atomic_fetch_sub_16",
                  BT_FN_I16_VPTR_I16_INT, ATTR_NOTHROWCALL_LEAF_LIST)
+#define FETCH_SUB_TYPE(F) BT_FN_##F##_VPTR_##F##_INT
+DEF_SYNC_FLOAT_BUILTINS (BUILT_IN_ATOMIC_FETCH_SUB, "__atomic_fetch_sub",
+                         FETCH_SUB_TYPE, ATTR_NOTHROWCALL_LEAF_LIST)
+#undef FETCH_SUB_TYPE
 
 DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_FETCH_AND_N,
                  "__atomic_fetch_and",
-- 
2.43.0

[PATCH 03/11] c: c++: Define new floating point builtin fetch_add functions

Reply via email to