On Mon, 2018-02-26 at 19:39 +0000, Ruslan Nikolaev via gcc wrote: > Torvald, I definitely do not want to insist on this design choice, but it > makes sense to at least seriuously consider it given the concerns I > described. And especially because IFFUNC in libatomic already redirects to > cmpxchg16b,
That's because we want to keep the old (but wrong) behavior unchanged, at least for existing code, until there's a time when we switch it. Given that we also don't declare those types lock-free anymore, new code will know that it's not safe to use them for cases such as inter-process communication (because if not lock-free, it's also not address-free anymore). > > Not getting the performance usually associated with atomic loads can be > > a big problem for code that tries to be portable. > > I do not think it is a common use case anyway. How often atomic_load is used > on double-width operations? A portable program doesn't have to think about things like double-width, or whether the platform is something like x32 vs. x86-64. What a portable program cares about is whether atomic ops are lock-free on a particular 64b integer type or not. If they are, you want to use them to synchronize (e.g., counters), and then it can matter a lot whether a load is actually a load or just creates lots of contention. If they aren't available, the program knows that it has to find a different way to synchronize (e.g., build the 64b counter out of 32b operations). > If a programmer needs some guarantees and does not care about lock-freedom, > why not use a regular lock here? They do care about whether atomic operations are natively supported on that particular type -- and that should include a load. > This way nothing magical happens. Otherwise, he will may hit unexpected > issues in places like signal handlers (which is hard to debug since it will > hang only once in a while). With cmpxchg16b, it is at least more or less > reproducible: if you tried to use it on read-only memory, you will > immediately get a segfault. Nobody is proposing to mark things as lock-free if they aren't. Thus, I don't see any change to what's usable in signal handlers. > > I think I now remember why we "didn't fix" libatomic: There might be > > compiled code out there that does use the wide CAS, so changing > > libatomic from the status quo to using its intenral locks could break > > programs. > Well, it already happens for Linux and glibc. There nothing will break. For > other architectures, it would be good to implement the same, so that > consistent behavior is observed everywhere. It's not about consistency across archs, but consistency for existing code. New code or new implementations should just do the right thing, which is requiring a natively supported atomic load of the particular size/alignment. > > > No, they only said that it doesn't need to be a concern for the > > standard. Implementations have to pay attention to more things, so it > > is a concern for implementation. > Yes, but the only problem I see is that it is currently placed to .rodata > when const is used. I and others are of different opinion: Load performance matters, inter-process communication on read-only memory matters, and it's useful to have the builtins work on not just _Atomic types but general integer types with proper alignment (e.g., look at how glibc uses the builtins in a code base that is not C11 or more recent). > It is easy to resolve: just do not place it there for _Atomic objects > 8 > bytes. Then also clarify that a programmer cannot safely cast some arbitrary > object that can be placed in .rodata to use with atomic_load. That doesn't help with the use cases I listed previously. > It needs to be addressed anyway, as there is already a segfault for provided > example in x86-64 and Linux even with redirection to libatomic. > > > It's not "visible" in the abstract machine under some setting of the > > as-if rule. But it is definitely visible in an implementation in which > > the effects of read-only memory are visible (see my example of mapping > > memory from another process read-only so as to read data from that > > process). > True but it is not defined for read-only memory anyway, The standard doesn't specify read-only memory, so it also doesn't forbid the concept. The implementation takes it into account though, and thus it's defined in that context. > and no assumptions can be made in portable code. No you can make assumptions, given what we want the implementation to do. We might need to explain that better (or at all) in the docs, but the idea is that *new* code can expect lock-free atomics to both have a true atomic load (ie, performance-wise) and have loads work on read-only-mapped memory.