Hi Jakub and Jonathan, Pinging on this again to ask for confirmation.
As Matthew mentioned earlier, does it sound OK to have patches in this order: > - Have builtins used in libstdc++ (under SFINAE checks). > - Have builtins defined and emitting correct code for AArch64. > - Add the CAS pattern matching into an IFN. We'd like to continue with the 2nd option that Jakub suggested — lowering the generic builtin to an IFN, and converting that to a CAS loop after IPA if the backend does not support a direct optab. --- As for changes on the libstdc++ side, since we cannot emit a direct call to the builtins and need to resolve these appropriately under SFINAE, how do we handle implemnting the SFINAE check for __atomic_base? I would like to ideally emit less calls to the compiler builtins and thus have a single check for the builtin, the way it is currently implemented in __atomic_impl. What would be a neater way to tackle this? - Forward declaration of just the min/max functions from __atomic_impl and resolve them using the __atomic_impl implementation for __atomic_base - Move the SFINAE checks outside __atomic_impl. This messes with the current philosophy of __atomic_impl though, as I understand it - Resolve the integer types using a separate SFINAE check. Would result in code duplication though - Any other suggestion I'm missing? Lastly, how do we guard these for C++26? Thanks, Soumya > On 7 Nov 2025, at 4:13 PM, Matthew Malcomson <[email protected]> wrote: > > > > On 11/4/25 12:43, Jakub Jelinek wrote: >> External email: Use caution opening links or attachments >> On Tue, Nov 04, 2025 at 11:33:05AM +0000, Matthew Malcomson wrote: >>> We've been meaning to send this email for a while after the Cauldron. >>> IIUC you discussed this with Ramana there -- my understanding of what he >>> told me is that your main concern is with the explosion of builtins. >> Yeah. >>> Similarly, what I understand is that if we reduce the number of builtins to >>> just the `__atomic_fetch_min`, `__atomic_fetch_max`, `__atomic_min_fetch` >>> and `__atomic_max_fetch` (which are the same builtins currently exposed by >>> clang) that seems more acceptable. >>> >>> I wanted to double-check that this understanding of a conversation I'm >>> hearing second-hand is correct. >> Yes. Even min vs. max could be reduced to minmax with an extra argument >> which determines which one it is, but I'm fine with those 4. >> But just those type generic ones, not the _{1,2,4,8,16} suffixes thereof >> etc., let the compiler figure out the signedness and size (and will it >> handle floating point too or not?). > > Great, sticking with those four works for us (and matches what clang has > already implemented). > > Yes -- we expect to make these four work for floating point too. > >>> Also I'd like to gather opinion (from you but also anyone who has an >>> opinion) about the following implementation design decisions: >>> 1) Should we still implement the libatomic functions? And still with the >>> unsigned/signed distinction and all sizes? >>> - I'd expect so, mostly for the `-fno-inline-atomics` flag. >> Do you really need that? Can't you just emit a CAS loop for the non-inline >> atomics? Because the function explosion will be there again on the >> libatomic side. Or at least don't add such symbols on arches which are >> never going to benefit from those right now (i.e. all the implementations >> would be CAS loops anyway). Some function explosion can be limited by only >> having the __atomic_fetch_{min,max}_{s,u}{1,2,4,8,16} variants and not >> the {min,max}_fetch ones, because that case can be implemented on the caller >> side by just returning the DESIRED argument separately. > > We don't need libatomic -- we were suggesting implementing it on the guess > that others would like it but are happy to drop that work and emit a CAS loop > from the compiler instead. > >>> 2) Earlier you floated the idea of using an internal function to encode the >>> operation through the rest of the compiler (outside of the frontend). Does >>> that approach still seem good to you? >> There are 2 options. >> Lower the type-generic builtin into a CAS loop and pattern recognize it at >> some late time (e.g. the widening_mul pass, certainly after IPA) into an IFN >> if the corresponding optab is supported. >> Or lower the type-generic builtin into IFN (ifns can have the min vs. max >> argument and derive size and sign from the DESIRED argument) and at some >> perhaps early (before IPA) point - forwprop? - pattern match a CAS loop into >> the IFN too and then ideally shortly after IPA lower the IFN back into a CAS >> loop if optab doesn't exist. >> The reason for the pre vs. post-IPA is OpenMP/OpenACC, before IPA you don't >> always know what the backend will be. > > Ah -- I did not know about this OpenMP/OpenACC case to watch out for -- > thanks for raising it. I think that shows my implementation of the CAS loop > expansion in my floating point fetch add/sub patches to be problematic. (I > had put it in the frontend because the decision about how to handle floating > point exceptions etc seemed a frontend-specific decision). > > ------ > Personally I like sound of lowering to an IFN before expanding it later on > the principle that if we do start to miss anything with our pattern match we > can at least ensure we don't miss anything the user explicitly provided as a > fetch_min/fetch_max. > (I.e. the second option you suggested). > > ------ > For clarity, in order to get some benefit in GCC 15 we have the following > goals (ordered by increasing amount of code changes we need to write, also > somewhat corresponding to how much we're aiming to get that in by GCC 15): > - Have builtins used in libstdc++ (under SFINAE checks). > - Have builtins defined and emitting correct code for AArch64. > - Add the CAS pattern matching into an IFN. > > Based on the above preferences we'd like to see if you'd be happy with the > first patchset to not include the CAS pattern matching. > - N.b. This assumes going with the second option you suggested of expanding > the IFN into a CAS loop if optab doesn't exist rather than pattern matching > if optab does exist. Hence if you'd want the first option do mention. > > Thanks again, > Matthew
