On Fri, 2017-02-03 at 17:21 +0100, Jakub Jelinek wrote: > On Fri, Feb 03, 2017 at 04:19:58PM +0000, Ramana Radhakrishnan wrote: > > > > Would it be acceptable for those users to have loads that perform like > > > > CAS loops, especially under contention? Or are these users more > > > > concerned about aarch64 not offering a true atomic 16-byte load? > > > > > > Can the store you need for atomicity be into an automatic var on the > > > stack? > > > > No, it has to be to the same location. > > But then it is the same problem as using cmpxchg16b on x86_64, the location > could be read-only, or that it is too slow otherwise for what users expect > for atomic load.
It would be the same problem. I was merely interested in the needs and concerns of those users that Ramana mentioned, regardless of whether these needs could be addressed in the scope of the __atomic builtins. For example, if those users just need fast atomic read-modify-write operation but not actually pure loads in their use cases (eg, reductions in a parallel workload), then something else than __atomic could provide that.