On 11/4/25 12:43, Jakub Jelinek wrote:
External email: Use caution opening links or attachments
On Tue, Nov 04, 2025 at 11:33:05AM +0000, Matthew Malcomson wrote:
We've been meaning to send this email for a while after the Cauldron.
IIUC you discussed this with Ramana there -- my understanding of what he
told me is that your main concern is with the explosion of builtins.
Yeah.
Similarly, what I understand is that if we reduce the number of builtins to
just the `__atomic_fetch_min`, `__atomic_fetch_max`, `__atomic_min_fetch`
and `__atomic_max_fetch` (which are the same builtins currently exposed by
clang) that seems more acceptable.
I wanted to double-check that this understanding of a conversation I'm
hearing second-hand is correct.
Yes. Even min vs. max could be reduced to minmax with an extra argument
which determines which one it is, but I'm fine with those 4.
But just those type generic ones, not the _{1,2,4,8,16} suffixes thereof
etc., let the compiler figure out the signedness and size (and will it
handle floating point too or not?).
Great, sticking with those four works for us (and matches what clang has
already implemented).
Yes -- we expect to make these four work for floating point too.
Also I'd like to gather opinion (from you but also anyone who has an
opinion) about the following implementation design decisions:
1) Should we still implement the libatomic functions? And still with the
unsigned/signed distinction and all sizes?
- I'd expect so, mostly for the `-fno-inline-atomics` flag.
Do you really need that? Can't you just emit a CAS loop for the non-inline
atomics? Because the function explosion will be there again on the
libatomic side. Or at least don't add such symbols on arches which are
never going to benefit from those right now (i.e. all the implementations
would be CAS loops anyway). Some function explosion can be limited by only
having the __atomic_fetch_{min,max}_{s,u}{1,2,4,8,16} variants and not
the {min,max}_fetch ones, because that case can be implemented on the caller
side by just returning the DESIRED argument separately.
We don't need libatomic -- we were suggesting implementing it on the
guess that others would like it but are happy to drop that work and emit
a CAS loop from the compiler instead.
2) Earlier you floated the idea of using an internal function to encode the
operation through the rest of the compiler (outside of the frontend). Does
that approach still seem good to you?
There are 2 options.
Lower the type-generic builtin into a CAS loop and pattern recognize it at
some late time (e.g. the widening_mul pass, certainly after IPA) into an IFN
if the corresponding optab is supported.
Or lower the type-generic builtin into IFN (ifns can have the min vs. max
argument and derive size and sign from the DESIRED argument) and at some
perhaps early (before IPA) point - forwprop? - pattern match a CAS loop into
the IFN too and then ideally shortly after IPA lower the IFN back into a CAS
loop if optab doesn't exist.
The reason for the pre vs. post-IPA is OpenMP/OpenACC, before IPA you don't
always know what the backend will be.
Ah -- I did not know about this OpenMP/OpenACC case to watch out for --
thanks for raising it. I think that shows my implementation of the CAS
loop expansion in my floating point fetch add/sub patches to be
problematic. (I had put it in the frontend because the decision about
how to handle floating point exceptions etc seemed a frontend-specific
decision).
------
Personally I like sound of lowering to an IFN before expanding it later
on the principle that if we do start to miss anything with our pattern
match we can at least ensure we don't miss anything the user explicitly
provided as a fetch_min/fetch_max.
(I.e. the second option you suggested).
------
For clarity, in order to get some benefit in GCC 15 we have the
following goals (ordered by increasing amount of code changes we need to
write, also somewhat corresponding to how much we're aiming to get that
in by GCC 15):
- Have builtins used in libstdc++ (under SFINAE checks).
- Have builtins defined and emitting correct code for AArch64.
- Add the CAS pattern matching into an IFN.
Based on the above preferences we'd like to see if you'd be happy with
the first patchset to not include the CAS pattern matching.
- N.b. This assumes going with the second option you suggested of
expanding the IFN into a CAS loop if optab doesn't exist rather than
pattern matching if optab does exist. Hence if you'd want the first
option do mention.
Thanks again,
Matthew