On Mon, 2018-02-26 at 07:24 +0000, Ruslan Nikolaev via gcc wrote: > Alexander, > Thank you for your comments. Please see my response below. I definitely do > not want to fight for or against this change in gcc, but there are definitely > legitimate concerns to consider. I think, it would really be good to > consider this change to make things more compatible (i.e., at least between > clang/llvm and gcc which can be both used within the same ecosystem). There > are real practical benefits of having true lock-free double-width operations > when implementing algorithms that rely on ABA tagging for pointers, and C11 > at last gives an opportunity to do that without resorting to assembly or > platform-specific implementations.
I agree a wide CAS can be useful, but that has to be weighed against all other use cases and how they'd be affected by the change you propose. Not getting the performance usually associated with atomic loads can be a big problem for code that tries to be portable. > > Note that there's more issues to that than just behavior on readonly memory: > > you need to ensure that the whole program, including all static and shared > > libraries, is compiled with -mcx16 (and currently there's no ld.so/ld-level > > support to ensure that), or you'd need to be sure that it's safe to mix code > > compiled with different -mcx16 settings because it never happens to interop > > on wide atomic objects. > > Well, if libatomic is already doing it when corresponding CPU feature is > available (i.e., effectively implementing operations using cmpxchg16b), I do > not see any problem here. mcx16 implies that you *have* cmpxchg16b, therefore > other code compiled without -mcx16 flag will go to libatomic. Inside > libatomic, it will detect that cmpxchg16b *is* available, thus making code > compiled with and without -mcx16 flag completely compatible on a given > system. Or do I miss something here? I think I now remember why we "didn't fix" libatomic: There might be compiled code out there that does use the wide CAS, so changing libatomic from the status quo to using its intenral locks could break programs. In contrast, only redirecting to libatomic and not promising lock-free anymore doesn't break these programs, but it gives us the opportunity to fix this in the future; because we don't advertise it those operations as lock-free anymore, we also make new programs aware that they won't get the default set of native atomic operations, and thus prevent new programs from running into this problem. > If you do not have cmpxchg16b, but the program is compiled with the flag, it > will simply not run (as expected). > > So, in other words, libatomic should still decide whether you have cmpxchg16b > or not for cases when -mcx16 is not specified. But if it is specified, > cmpxchg16b can be generated unconditionally. If you want better > compatibility, you will not specify the flag. Mix of -mcx16 and mno-cx16 will > be, thus, binary compatible. > > > Note that there's no "load" function in the __sync family, so the original > > concern about operations on readonly memory does not apply. > Yes, but per clarification from WG14/C11, read-only memory should not be a > concern at all, No, they only said that it doesn't need to be a concern for the standard. Implementations have to pay attention to more things, so it is a concern for implementation. > as this behavior is not specified anyway (regardless of the const specifier). > Read-modify-write is allowed for atomic_load as long as there is no 'visible' > change on the value being loaded. It's not "visible" in the abstract machine under some setting of the as-if rule. But it is definitely visible in an implementation in which the effects of read-only memory are visible (see my example of mapping memory from another process read-only so as to read data from that process). > In this sense, the bug that was filed previously regarding read-only memory > accesses and const specifier does not seem to be valid. > Additionally, it is really odd and counterintuitive to still provide support > for (almost) deprecated macros while not giving such an opportunity for newer > and more advanced functions. > > > You don't mention it directly, so just to make it clear for readers: on > > systems > > where GNU IFUNC extension is available (i.e. on Glibc), libatomic tries to > > do > > exactly that: test for cmpxchg16b availability and redirect 128-bit atomics > > to > > lock-free RMW implementations if so. (I don't like this solution) > > Yes, but libatomic makes things slower due to indirection. Also, it is much > harder to track what is going on, as there is no guarantee of lock-freedom in > this case. BTW -- The fact that it currently uses cmpxchg16b if available may > actually be helpful to switch to the suggested behavior without breaking > binary compatibility (if I understand everything correctly). It's rather done that way to switch away from the previous behavior but in a manner that's less likely to break existing programs.