Re: [PATCH 1/8] builtin: Add builtin types and function declarations for integer atomic fetch min/max

Matthew Malcomson Fri, 07 Nov 2025 02:44:11 -0800



On 11/4/25 12:43, Jakub Jelinek wrote:

External email: Use caution opening links or attachments


On Tue, Nov 04, 2025 at 11:33:05AM +0000, Matthew Malcomson wrote:

We've been meaning to send this email for a while after the Cauldron.
IIUC you discussed this with Ramana there -- my understanding of what he
told me is that your main concern is with the explosion of builtins.


Yeah.

Similarly, what I understand is that if we reduce the number of builtins to
just the `__atomic_fetch_min`, `__atomic_fetch_max`, `__atomic_min_fetch`
and `__atomic_max_fetch` (which are the same builtins currently exposed by
clang) that seems more acceptable.

I wanted to double-check that this understanding of a conversation I'm
hearing second-hand is correct.


Yes.  Even min vs. max could be reduced to minmax with an extra argument
which determines which one it is, but I'm fine with those 4.
But just those type generic ones, not the _{1,2,4,8,16} suffixes thereof
etc., let the compiler figure out the signedness and size (and will it
handle floating point too or not?).

Great, sticking with those four works for us (and matches what clang hasalready implemented).


Yes -- we expect to make these four work for floating point too.

Also I'd like to gather opinion (from you but also anyone who has an
opinion) about the following implementation design decisions:
1) Should we still implement the libatomic functions?  And still with the
unsigned/signed distinction and all sizes?
- I'd expect so, mostly for the `-fno-inline-atomics` flag.


Do you really need that?  Can't you just emit a CAS loop for the non-inline
atomics?  Because the function explosion will be there again on the
libatomic side.  Or at least don't add such symbols on arches which are
never going to benefit from those right now (i.e. all the implementations
would be CAS loops anyway).  Some function explosion can be limited by only
having the __atomic_fetch_{min,max}_{s,u}{1,2,4,8,16} variants and not
the {min,max}_fetch ones, because that case can be implemented on the caller
side by just returning the DESIRED argument separately.

We don't need libatomic -- we were suggesting implementing it on theguess that others would like it but are happy to drop that work and emita CAS loop from the compiler instead.

2) Earlier you floated the idea of using an internal function to encode the
operation through the rest of the compiler (outside of the frontend).  Does
that approach still seem good to you?


There are 2 options.
Lower the type-generic builtin into a CAS loop and pattern recognize it at
some late time (e.g. the widening_mul pass, certainly after IPA) into an IFN
if the corresponding optab is supported.
Or lower the type-generic builtin into IFN (ifns can have the min vs. max
argument and derive size and sign from the DESIRED argument) and at some
perhaps early (before IPA) point - forwprop? - pattern match a CAS loop into
the IFN too and then ideally shortly after IPA lower the IFN back into a CAS
loop if optab doesn't exist.
The reason for the pre vs. post-IPA is OpenMP/OpenACC, before IPA you don't
always know what the backend will be.

Ah -- I did not know about this OpenMP/OpenACC case to watch out for --thanks for raising it. I think that shows my implementation of the CASloop expansion in my floating point fetch add/sub patches to beproblematic. (I had put it in the frontend because the decision abouthow to handle floating point exceptions etc seemed a frontend-specificdecision).


------

Personally I like sound of lowering to an IFN before expanding it lateron the principle that if we do start to miss anything with our patternmatch we can at least ensure we don't miss anything the user explicitlyprovided as a fetch_min/fetch_max.

(I.e. the second option you suggested).

------

For clarity, in order to get some benefit in GCC 15 we have thefollowing goals (ordered by increasing amount of code changes we need towrite, also somewhat corresponding to how much we're aiming to get thatin by GCC 15):

- Have builtins used in libstdc++ (under SFINAE checks).
- Have builtins defined and emitting correct code for AArch64.
- Add the CAS pattern matching into an IFN.

Based on the above preferences we'd like to see if you'd be happy withthe first patchset to not include the CAS pattern matching.- N.b. This assumes going with the second option you suggested ofexpanding the IFN into a CAS loop if optab doesn't exist rather thanpattern matching if optab does exist. Hence if you'd want the firstoption do mention.


Thanks again,
Matthew

Re: [PATCH 1/8] builtin: Add builtin types and function declarations for integer atomic fetch min/max

Reply via email to