On Wed, 2 Oct 2024 at 17:48, Matthew Malcomson <mmalcom...@nvidia.com> wrote: > > Thanks Jonathan, > > I agree with your point that having just the check against one of the > overloaded versions is not very robust and having multiple checks against > different versions would be better. > > Unfortunately — while asking the clang folk about this I realised that clang > doesn't expose the resolved versions (e.g. the existing versions like > `__atomic_load_2` etc) to the user. > Instead they allow using SFINAE on these overloaded builtins. > https://discourse.llvm.org/t/atomic-floating-point-operations-and-libstdc/81461 > > I spent some time looking at this and it seems that enabling SFINAE in GCC > for these builtins is not too problematic (idea being to pass in a > `error_on_noresolve` boolean to `resolve_overloaded_builtin` based on the > context in the C++ frontend, then only emit errors if that boolean is set). > > To Jonathon: > > Would you be OK with using SFINAE to choose whether to use the > __atomic_fetch_add builtin for floating point types in libstdc++?
It's not great for compile times, but no objection otherwise. > > > At C++ frontend maintainers I Cc'd in: > > Are you happy with the idea of enabling SFINAE on overloaded builtins > resolved via resolve_overloaded_builtin? > > To global maintainers I Cc'd in: > > Is there any reason you know of not to enable SFINAE on the overloaded > builtins? > Would it be OK to enable SFINAE on the generic overloaded builtins and add > the parameter so that targets can do the same for their target-specific > builtins (i.e. without changing the behaviour of the existing target specific > builtins)? > > > ________________________________ > From: Jonathan Wakely <jwak...@redhat.com> > Sent: 19 September 2024 3:47 PM > To: Matthew Malcomson <mmalcom...@nvidia.com> > Cc: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>; Joseph Myers > <josmy...@redhat.com>; Richard Biener <rguent...@suse.de> > Subject: Re: [PATCH 0/8] [RFC] Introduce floating point fetch_add builtins > > External email: Use caution opening links or attachments > > > On Thu, 19 Sept 2024 at 14:12, <mmalcom...@nvidia.com> wrote: > > > > From: Matthew Malcomson <mmalcom...@nvidia.com> > > > > Hello, this is an RFC for adding an atomic floating point fetch_add builtin > > (and variants) to GCC. The atomic fetch_add operation is defined to work > > on the base floating point types in the C++20 standard chapter 31.7.3, and > > extended to work for all cv-unqualified floating point types in C++23 > > chapter 33.5.7.4. > > > > Honestly not sure who to Cc, please do point me to someone else if that's > > better. > > > > This is nowhere near complete (for one thing even the tests I've added > > don't fully pass), but I think I have a complete enough idea that it's > > worth checking if this is something that could be agreed on. > > > > As it stands no target except the nvptx backend would natively support > > these operations. > > > > Main questions that I'm looking to resolve with this RFC: > > 1) Would GCC be OK accepting this implementation even though no backend > > would be implementing these yet? > > - AIUI only the nvptx backend could theoretically implement this. > > - Even without a backend implementing it natively, the ability to use > > this in code (especially libstdc++) enables other compilers to > > generate better code for GPU's using standard C++. > > 2) Would libstdc++ be OK relying on `__has_builtin(__atomic_fetch_add_fp)` > > (i.e. a check on the resolved builtin rather than the more user-facing > > one) in order to determine whether floating point atomic fetch_add is > > available. > > Yes, if that name is what other compilers will also use (have you > discussed this with Clang?) > > It looks like PATCH 5/8 only uses the _fp name for fetch_add though, > and just uses fetch_sub etc. for the other functions, is that a > mistake? > > > - N.b. this builtin is actually the builtin working on the "double" > > OK, so the library code just calls the generic __atomic_fetch_add that > accepts any types, but then that gets expanded to a more specific form > for float, double etc.? > And the more specific form has to exist at some level, because we need > an extern symbol from libatomic, so either we include the type as an > explicit suffix on the name, or we use some kind of name mangling like > _Z18__atomic_fetch_addPdS_S_, which is obviously nasty. > > > type, one would have to rely on any compilers implementing that > > particular resolved builtin to also implement the other floating point > > atomic fetch_add builtins that they would want to support in libstdc++ > > `atomic<[floating_point_type]>::fetch_add`. > > This seems a bit concerning. I can imagine somebody implementing these > for float and double first, but leaving long double, _Float64, > _Float32, _Float128 etc. for later. In that case, libstdc++ would not > work if somebody tries to use std::atomic<long double>, or whichever > types aren't supported yet. It's OK if we can be *sure* that won't > happen i.e. that Clang will either implement the new built-in for > *all* FP types, or none. > > > > > More specific questions about the choice of which builtins to implement and > > whether the types are OK: > > 1) Is it OK to not implement the `__sync_*` versions? > > Since these are deprecated and the `__atomic_*` versions are there to > > match the C/C++ code atomic operations (which is a large part of the > > reason for the new floating point operations). > > 2) Is it OK to not implement all the `fetch_*` operations? > > None of the bitwise operations are specified for C++ and bitwise > > operations are (AIUI) rarely used on floating point values. > > That seems OK (entirely correct even) to me. > > > > 3) Wanting to be able to farm out to libatomic meant that we need constant > > names > > for the specialised functions. > > - This led to the naming convention based on floating point type. > > - That naming convention *could* be updated to include the special > > backend > > floating point types if needed. I have not done this mostly because I > > thought it would not add much, though I have not looked into this very > > closely. > > 4) Wanting to name the functions based on floating point type rather than > > size > > meant that the mapping from type passed to the overloaded version to > > specialised builtin was less direct than for the integral versions. > > - Ended up with a hard-coded table in the source to check this. > > - Would very much like some better approach, not certain what better > > approach > > I could find. > > - Will eventually at least share the hard-coded tables (which isn't > > happening yet because this is at RFC level). > > 5) Are there any other types that I should use? > > Similarly are there any types that I'm trying to use that I shouldn't? > > I *believe* I've implemented all the types that make sense and are > > general builtin types. Could easily have missed some (e.g. left > > `_Float128x` alone because AIUI it's larger than 128bits which means we > > don't have any other atomic operations on such data), could also easily > > That seems like a problem though - it means that GCC could be in > exactly the concerning position described above: there could be a FP > type (_Float128x) for which we have no __atomic_fetch_add built-in, > but the __has_builtin(__atomic_fetch_add) check would work, because > the built-in for double exists. So std::atomic<_Float128x> would not > work. > > I don't think that's a problem today, because we don't have a > _Float128x type AFAIK. But that suggests to me that the rule for > whether to define the built-in for a FP type needs to be based on > whether the type exists at all, not whether we have other atomic ops > for it. > > Maybe the libstdc++ code should have a separate __has_builtin check > for each type, e.g. > > bool __use_builtin = false; > +#if __has_builtin(__atomic_fetch_add_fp) > + if constexpr (is_same_v<_Tp, double>) > + __use_builtin = true; > +##endif > +#if __has_builtin(__atomic_fetch_add_fpf) > + if constexpr (is_same_v<_Tp, float>) > + __use_builtin = true; > +##endif > +#if __has_builtin(__atomic_fetch_add_fpl) > + if constexpr (is_same_v<_Tp, long double>) > + __use_builtin = true; > +##endif > // ... > > if (__use_builtin) > return __atomic_fatch_add(...); > // .. current CAS-based implementation > > And so on with __has_builtin checks for the function for each type. > This would be a lot more tedious, but would handle the case where the > new built-in is only supported for some types. > > > > be misunderstanding the mention in the C++ standards of "extended" types > > (which I believe is the `_Float*` and `bfloat16` types). > > Yes. The C++ standard says there's a typedef std::float32_t which > denotes some implementation-defined type, which is _Float32 in our > implementation. But the _Float32 name is not in the C++ standard. > > > > 6) Anything special about floating point maths that I'm tripping up on? > > (Especially around CAS loop where we expand the operation directly, > > sometimes convert a PLUS into a MINUS of a negative value ...). > > Don't know of anything myself, but also a bit wary of floating point > > maths. > > > > N.b. I know that there's a reasonable amount of work left in: > > 1) Ensuring that all the places which use atomic types are updated > > (e.g. asan), > > 2) Ensuring that all frontends can use these to the level that they could > > use the integral atomics. > > 3) Ensuring the major backends can still compile libatomic (I had to do > > something in AArch64 libatomic backend to make it compile, seems like > > there might be more to do for others). > > > > > > Matthew Malcomson (8): > > Define new floating point builtin fetch_add functions > > Add FP types for atomic builtin overload resolution > > Tie the new atomic builtins to the backend > > Have libatomic working as first draft > > Use new builtins in libstdc++ > > First attempt at testsuite > > Mention floating point atomic fetch_add etc in docs > > Add demo implementation of one of the operations > > > > gcc/builtin-types.def | 20 + > > gcc/builtins.cc | 153 ++- > > gcc/c-family/c-common.cc | 88 +- > > gcc/config/aarch64/aarch64.h | 2 + > > gcc/config/aarch64/aarch64.opt | 5 + > > gcc/config/aarch64/atomics.md | 15 + > > gcc/doc/extend.texi | 12 + > > gcc/fortran/types.def | 48 + > > gcc/optabs.cc | 101 +- > > gcc/optabs.def | 6 +- > > gcc/optabs.h | 2 +- > > gcc/sync-builtins.def | 40 + > > gcc/testsuite/gcc.dg/atomic-op-fp.c | 204 +++ > > gcc/testsuite/gcc.dg/atomic-op-fpf.c | 204 +++ > > gcc/testsuite/gcc.dg/atomic-op-fpf128.c | 204 +++ > > gcc/testsuite/gcc.dg/atomic-op-fpf16.c | 204 +++ > > gcc/testsuite/gcc.dg/atomic-op-fpf16b.c | 204 +++ > > gcc/testsuite/gcc.dg/atomic-op-fpf32.c | 204 +++ > > gcc/testsuite/gcc.dg/atomic-op-fpf32x.c | 204 +++ > > gcc/testsuite/gcc.dg/atomic-op-fpf64.c | 204 +++ > > gcc/testsuite/gcc.dg/atomic-op-fpf64x.c | 204 +++ > > gcc/testsuite/gcc.dg/atomic-op-fpl.c | 204 +++ > > gcc/testsuite/lib/target-supports.exp | 463 ++++++- > > libatomic/Makefile.am | 6 +- > > libatomic/Makefile.in | 12 +- > > libatomic/acinclude.m4 | 49 + > > libatomic/auto-config.h.in | 84 +- > > libatomic/config/linux/aarch64/host-config.h | 2 + > > libatomic/configure | 1153 ++++++++++++++++- > > libatomic/configure.ac | 4 + > > libatomic/fadd_n.c | 23 + > > libatomic/fop_n.c | 5 +- > > libatomic/fsub_n.c | 23 + > > libatomic/libatomic.map | 44 + > > libatomic/libatomic_i.h | 58 + > > libatomic/testsuite/Makefile.in | 1 + > > .../testsuite/libatomic.c/atomic-op-fp.c | 203 +++ > > .../testsuite/libatomic.c/atomic-op-fpf.c | 203 +++ > > .../testsuite/libatomic.c/atomic-op-fpf128.c | 203 +++ > > .../testsuite/libatomic.c/atomic-op-fpf16.c | 203 +++ > > .../testsuite/libatomic.c/atomic-op-fpf16b.c | 203 +++ > > .../testsuite/libatomic.c/atomic-op-fpf32.c | 203 +++ > > .../testsuite/libatomic.c/atomic-op-fpf32x.c | 203 +++ > > .../testsuite/libatomic.c/atomic-op-fpf64.c | 203 +++ > > .../testsuite/libatomic.c/atomic-op-fpf64x.c | 203 +++ > > .../testsuite/libatomic.c/atomic-op-fpl.c | 203 +++ > > libstdc++-v3/include/bits/atomic_base.h | 16 + > > 47 files changed, 6393 insertions(+), 112 deletions(-) > > create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fp.c > > create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpf.c > > create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpf128.c > > create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpf16.c > > create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpf16b.c > > create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpf32.c > > create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpf32x.c > > create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpf64.c > > create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpf64x.c > > create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpl.c > > create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fp.c > > create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpf.c > > create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpf128.c > > create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpf16.c > > create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpf16b.c > > create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpf32.c > > create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpf32x.c > > create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpf64.c > > create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpf64x.c > > create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpl.c > > > > -- > > 2.43.0 > > >