Re: [PATCH 0/8] [RFC] Introduce floating point fetch_add builtins

Jonathan Wakely Wed, 02 Oct 2024 12:40:45 -0700

On Wed, 2 Oct 2024 at 17:48, Matthew Malcomson <mmalcom...@nvidia.com> wrote:
>
> Thanks Jonathan,
>
> I agree with your point that having just the check against one of the 
> overloaded versions is not very robust and having multiple checks against 
> different versions would be better.
>
> Unfortunately — while asking the clang folk about this I realised that clang 
> doesn't expose the resolved versions (e.g. the existing versions like 
> `__atomic_load_2` etc) to the user.
> Instead they allow using SFINAE on these overloaded builtins.
> https://discourse.llvm.org/t/atomic-floating-point-operations-and-libstdc/81461
>
> I spent some time looking at this and it seems that enabling SFINAE in GCC 
> for these builtins is not too problematic (idea being to pass in a 
> `error_on_noresolve` boolean to `resolve_overloaded_builtin` based on the 
> context in the C++ frontend, then only emit errors if that boolean is set).
>
> To Jonathon:
>
> Would you be OK with using SFINAE to choose whether to use the 
> __atomic_fetch_add builtin for floating point types in libstdc++?


It's not great for compile times, but no objection otherwise.

>
>
> At C++ frontend maintainers I Cc'd in:
>
> Are you happy with the idea of enabling SFINAE on overloaded builtins 
> resolved via resolve_overloaded_builtin?
>
> To global maintainers I Cc'd in:
>
> Is there any reason you know of not to enable SFINAE on the overloaded 
> builtins?
> Would it be OK to enable SFINAE on the generic overloaded builtins and add 
> the parameter so that targets can do the same for their target-specific 
> builtins (i.e. without changing the behaviour of the existing target specific 
> builtins)?
>
>
> ________________________________
> From: Jonathan Wakely <jwak...@redhat.com>
> Sent: 19 September 2024 3:47 PM
> To: Matthew Malcomson <mmalcom...@nvidia.com>
> Cc: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>; Joseph Myers 
> <josmy...@redhat.com>; Richard Biener <rguent...@suse.de>
> Subject: Re: [PATCH 0/8] [RFC] Introduce floating point fetch_add builtins
>
> External email: Use caution opening links or attachments
>
>
> On Thu, 19 Sept 2024 at 14:12, <mmalcom...@nvidia.com> wrote:
> >
> > From: Matthew Malcomson <mmalcom...@nvidia.com>
> >
> > Hello, this is an RFC for adding an atomic floating point fetch_add builtin
> > (and variants) to GCC.  The atomic fetch_add operation is defined to work
> > on the base floating point types in the C++20 standard chapter 31.7.3, and
> > extended to work for all cv-unqualified floating point types in C++23
> > chapter 33.5.7.4.
> >
> > Honestly not sure who to Cc, please do point me to someone else if that's
> > better.
> >
> > This is nowhere near complete (for one thing even the tests I've added
> > don't fully pass), but I think I have a complete enough idea that it's
> > worth checking if this is something that could be agreed on.
> >
> > As it stands no target except the nvptx backend would natively support
> > these operations.
> >
> > Main questions that I'm looking to resolve with this RFC:
> > 1) Would GCC be OK accepting this implementation even though no backend
> >    would be implementing these yet?
> >    - AIUI only the nvptx backend could theoretically implement this.
> >    - Even without a backend implementing it natively, the ability to use
> >      this in code (especially libstdc++) enables other compilers to
> >      generate better code for GPU's using standard C++.
> > 2) Would libstdc++ be OK relying on `__has_builtin(__atomic_fetch_add_fp)`
> >    (i.e. a check on the resolved builtin rather than the more user-facing
> >    one) in order to determine whether floating point atomic fetch_add is
> >    available.
>
> Yes, if that name is what other compilers will also use (have you
> discussed this with Clang?)
>
> It looks like PATCH 5/8 only uses the _fp name for fetch_add though,
> and just uses fetch_sub etc. for the other functions, is that a
> mistake?
>
> >    - N.b. this builtin is actually the builtin working on the "double"
>
> OK, so the library code just calls the generic __atomic_fetch_add that
> accepts any types, but then that gets expanded to a more specific form
> for float, double etc.?
> And the more specific form has to exist at some level, because we need
> an extern symbol from libatomic, so either we include the type as an
> explicit suffix on the name, or we use some kind of name mangling like
> _Z18__atomic_fetch_addPdS_S_, which is obviously nasty.
>
> >      type, one would have to rely on any compilers implementing that
> >      particular resolved builtin to also implement the other floating point
> >      atomic fetch_add builtins that they would want to support in libstdc++
> >      `atomic<[floating_point_type]>::fetch_add`.
>
> This seems a bit concerning. I can imagine somebody implementing these
> for float and double first, but leaving long double, _Float64,
> _Float32, _Float128 etc. for later. In that case, libstdc++ would not
> work if somebody tries to use std::atomic<long double>, or whichever
> types aren't supported yet. It's OK if we can be *sure* that won't
> happen i.e. that Clang will either implement the new built-in for
> *all* FP types, or none.
>
> >
> > More specific questions about the choice of which builtins to implement and
> > whether the types are OK:
> > 1) Is it OK to not implement the `__sync_*` versions?
> >    Since these are deprecated and the `__atomic_*` versions are there to
> >    match the C/C++ code atomic operations (which is a large part of the
> >    reason for the new floating point operations).
> > 2) Is it OK to not implement all the `fetch_*` operations?
> >    None of the bitwise operations are specified for C++ and bitwise
> >    operations are (AIUI) rarely used on floating point values.
>
> That seems OK (entirely correct even) to me.
>
>
> > 3) Wanting to be able to farm out to libatomic meant that we need constant 
> > names
> >    for the specialised functions.
> >    - This led to the naming convention based on floating point type.
> >    - That naming convention *could* be updated to include the special 
> > backend
> >      floating point types if needed.  I have not done this mostly because I
> >      thought it would not add much, though I have not looked into this very
> >      closely.
> > 4) Wanting to name the functions based on floating point type rather than 
> > size
> >    meant that the mapping from type passed to the overloaded version to
> >    specialised builtin was less direct than for the integral versions.
> >    - Ended up with a hard-coded table in the source to check this.
> >    - Would very much like some better approach, not certain what better 
> > approach
> >      I could find.
> >    - Will eventually at least share the hard-coded tables (which isn't
> >      happening yet because this is at RFC level).
> > 5) Are there any other types that I should use?
> >    Similarly are there any types that I'm trying to use that I shouldn't?
> >    I *believe* I've implemented all the types that make sense and are
> >    general builtin types.  Could easily have missed some (e.g. left
> >    `_Float128x` alone because AIUI it's larger than 128bits which means we
> >    don't have any other atomic operations on such data), could also easily
>
> That seems like a problem though - it means that GCC could be in
> exactly the concerning position described above: there could be a FP
> type (_Float128x) for which we have no __atomic_fetch_add built-in,
> but the __has_builtin(__atomic_fetch_add) check would work, because
> the built-in for double exists. So std::atomic<_Float128x> would not
> work.
>
> I don't think that's a problem today, because we don't have a
> _Float128x type AFAIK. But that suggests to me that the rule for
> whether to define the built-in for a FP type needs to be based on
> whether the type exists at all, not whether we have other atomic ops
> for it.
>
> Maybe the libstdc++ code should have a separate __has_builtin check
> for each type, e.g.
>
>          bool __use_builtin = false;
> +#if __has_builtin(__atomic_fetch_add_fp)
> +       if constexpr (is_same_v<_Tp, double>)
> +         __use_builtin = true;
> +##endif
> +#if __has_builtin(__atomic_fetch_add_fpf)
> +       if constexpr (is_same_v<_Tp, float>)
> +         __use_builtin = true;
> +##endif
> +#if __has_builtin(__atomic_fetch_add_fpl)
> +       if constexpr (is_same_v<_Tp, long double>)
> +         __use_builtin = true;
> +##endif
> // ...
>
>         if (__use_builtin)
>          return __atomic_fatch_add(...);
>        // .. current CAS-based implementation
>
> And so on with __has_builtin checks for the function for each type.
> This would be a lot more tedious, but would handle the case where the
> new built-in is only supported for some types.
>
>
> >    be misunderstanding the mention in the C++ standards of "extended" types
> >    (which I believe is the `_Float*` and `bfloat16` types).
>
> Yes. The C++ standard says there's a typedef std::float32_t which
> denotes some implementation-defined type, which is _Float32 in our
> implementation. But the _Float32 name is not in the C++ standard.
>
>
> > 6) Anything special about floating point maths that I'm tripping up on?
> >    (Especially around CAS loop where we expand the operation directly,
> >    sometimes convert a PLUS into a MINUS of a negative value ...).
> >    Don't know of anything myself, but also a bit wary of floating point
> >    maths.
> >
> > N.b. I know that there's a reasonable amount of work left in:
> > 1) Ensuring that all the places which use atomic types are updated
> >    (e.g. asan),
> > 2) Ensuring that all frontends can use these to the level that they could
> >    use the integral atomics.
> > 3) Ensuring the major backends can still compile libatomic (I had to do
> >    something in AArch64 libatomic backend to make it compile, seems like
> >    there might be more to do for others).
> >
> >
> > Matthew Malcomson (8):
> >   Define new floating point builtin fetch_add functions
> >   Add FP types for atomic builtin overload resolution
> >   Tie the new atomic builtins to the backend
> >   Have libatomic working as first draft
> >   Use new builtins in libstdc++
> >   First attempt at testsuite
> >   Mention floating point atomic fetch_add etc in docs
> >   Add demo implementation of one of the operations
> >
> >  gcc/builtin-types.def                         |   20 +
> >  gcc/builtins.cc                               |  153 ++-
> >  gcc/c-family/c-common.cc                      |   88 +-
> >  gcc/config/aarch64/aarch64.h                  |    2 +
> >  gcc/config/aarch64/aarch64.opt                |    5 +
> >  gcc/config/aarch64/atomics.md                 |   15 +
> >  gcc/doc/extend.texi                           |   12 +
> >  gcc/fortran/types.def                         |   48 +
> >  gcc/optabs.cc                                 |  101 +-
> >  gcc/optabs.def                                |    6 +-
> >  gcc/optabs.h                                  |    2 +-
> >  gcc/sync-builtins.def                         |   40 +
> >  gcc/testsuite/gcc.dg/atomic-op-fp.c           |  204 +++
> >  gcc/testsuite/gcc.dg/atomic-op-fpf.c          |  204 +++
> >  gcc/testsuite/gcc.dg/atomic-op-fpf128.c       |  204 +++
> >  gcc/testsuite/gcc.dg/atomic-op-fpf16.c        |  204 +++
> >  gcc/testsuite/gcc.dg/atomic-op-fpf16b.c       |  204 +++
> >  gcc/testsuite/gcc.dg/atomic-op-fpf32.c        |  204 +++
> >  gcc/testsuite/gcc.dg/atomic-op-fpf32x.c       |  204 +++
> >  gcc/testsuite/gcc.dg/atomic-op-fpf64.c        |  204 +++
> >  gcc/testsuite/gcc.dg/atomic-op-fpf64x.c       |  204 +++
> >  gcc/testsuite/gcc.dg/atomic-op-fpl.c          |  204 +++
> >  gcc/testsuite/lib/target-supports.exp         |  463 ++++++-
> >  libatomic/Makefile.am                         |    6 +-
> >  libatomic/Makefile.in                         |   12 +-
> >  libatomic/acinclude.m4                        |   49 +
> >  libatomic/auto-config.h.in                    |   84 +-
> >  libatomic/config/linux/aarch64/host-config.h  |    2 +
> >  libatomic/configure                           | 1153 ++++++++++++++++-
> >  libatomic/configure.ac                        |    4 +
> >  libatomic/fadd_n.c                            |   23 +
> >  libatomic/fop_n.c                             |    5 +-
> >  libatomic/fsub_n.c                            |   23 +
> >  libatomic/libatomic.map                       |   44 +
> >  libatomic/libatomic_i.h                       |   58 +
> >  libatomic/testsuite/Makefile.in               |    1 +
> >  .../testsuite/libatomic.c/atomic-op-fp.c      |  203 +++
> >  .../testsuite/libatomic.c/atomic-op-fpf.c     |  203 +++
> >  .../testsuite/libatomic.c/atomic-op-fpf128.c  |  203 +++
> >  .../testsuite/libatomic.c/atomic-op-fpf16.c   |  203 +++
> >  .../testsuite/libatomic.c/atomic-op-fpf16b.c  |  203 +++
> >  .../testsuite/libatomic.c/atomic-op-fpf32.c   |  203 +++
> >  .../testsuite/libatomic.c/atomic-op-fpf32x.c  |  203 +++
> >  .../testsuite/libatomic.c/atomic-op-fpf64.c   |  203 +++
> >  .../testsuite/libatomic.c/atomic-op-fpf64x.c  |  203 +++
> >  .../testsuite/libatomic.c/atomic-op-fpl.c     |  203 +++
> >  libstdc++-v3/include/bits/atomic_base.h       |   16 +
> >  47 files changed, 6393 insertions(+), 112 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fp.c
> >  create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpf.c
> >  create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpf128.c
> >  create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpf16.c
> >  create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpf16b.c
> >  create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpf32.c
> >  create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpf32x.c
> >  create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpf64.c
> >  create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpf64x.c
> >  create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpl.c
> >  create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fp.c
> >  create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpf.c
> >  create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpf128.c
> >  create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpf16.c
> >  create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpf16b.c
> >  create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpf32.c
> >  create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpf32x.c
> >  create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpf64.c
> >  create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpf64x.c
> >  create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpl.c
> >
> > --
> > 2.43.0
> >
>

Re: [PATCH 0/8] [RFC] Introduce floating point fetch_add builtins

Reply via email to