On Mon, 18 Nov 2024, Arnd Bergmann wrote: > >> a) storing an '_Atomic' variable smaller than 8 bytes on non-bwx > >> targets should use ll/sc, but uses a plain rmw cycle. > > > > I think there should be no problem with aligned 4-byte (longword) data > > quantities, given that there are both plain and locked/conditional load > > and store machine operations provided by the non-BWX architecture. Why do > > you think otherwise? > > It's quite possible that I misremembered this and it was only > a problem on 1-byte and 2-byte stores, even for _Atomic.
Fair enough. > > This does use an LL/SC sequence. Original non-BWX sequence uses plain > > unaligned load/store operations instead: > > Right, with -msafe-bwa this would be covered, but I'm not sure > if you'd expect to build all of userspace with that flag as well. I think it is a reasonable plan, which I have already considered, but chose not to propose updating compiler specs at this time to do this for the Alpha/Linux targets by default. > I can certainly see how one would argue that userspace doesn't > need to use LL/SC for non-atomic subword access, but at the same While POSIX.1-2024 has: "If the process is multi-threaded, or if the process is single-threaded and a signal handler is executed other than as the result of: "[...] "the behavior is undefined if: "* The signal handler refers to any object other than `errno' with static or thread storage duration that is not a lock-free atomic object, and not a non-modifiable object (for example, string literals, objects that were defined with a const-qualified type, and objects in memory that is mapped read-only), other than by assigning a value to an object declared as `volatile sig_atomic_t', unless the previous modification (if any) to the object happens before the signal handler is called and the return from the signal handler happens before the next modification (if any) to the object." and `sig_atomic_t' is typedef'd to `int' on Alpha/Linux (then again, what about object with allocated storage?), I yet need to find a reference on what the requirements are for threads, if any. It does not appear to me that people are supposed to expect data races with accesses to non-atomic objects with static or allocated storage owing to another thread accessing an unrelated non-atomic object just because they land next to each other at link time, e.g. in BSS, especially if those only happen with an obscure old CPU architecture. Maybe less so when it comes to parts of an aggregate, but still I'm not fully convinced we should permit this by default. Therefore I think long-term it'll make sense to make Alpha/Linux default to `-msafe-bwa -msafe-partial', while retaining the current code model for use by environments such as bare metal that do not support concurrent or parallel execution. > time I think the RMW sequence on _Atomic variables is a clear > bug that you'd need to fix also for -mno-safe-bwa. That's weird indeed, GCC internals manual clearly says: "'atomic_storeMODE' This pattern implements an atomic store operation with memory model semantics. Operand 0 is the memory address being stored to. Operand 1 is the value to be written. Operand 2 is the memory model to be used for the operation. If not present, the '__atomic_store' built-in function will attempt to perform a normal store and surround it with any required memory fences. If the store would not be atomic, then an '__atomic_exchange' is attempted with the result being ignored." and while we do not have `atomic_storeqi' nor `atomic_storehi' operations we do have `atomic_exchangeqi' and `atomic_exchangehi' ones defined in the Alpha backend (for byte and word operations respectively). So presumably the middle end is not aware for some reason that on non-BWX Alpha normal QI and HI mode stores are not atomic. I do hope this obviously obscure but trivial GCC bug was not the *sole* reason to drop non-BWX support from Linux. > > To avoid triggering side effects Alpha system chipsets define a sparse > > I/O space decoding window where data locations are spaced apart such that > > no BWX operations are required to read or write individual 8-bit, or > > 16-bit, or even 24-bit peripheral registers. With DEC's own peripheral > > bus solutions it may not have been always necessary, but surely it has > > been to support PCI with the original Alpha implementation (EV4). We do > > have support for sparse I/O space operations in Linux, we've always had. > > > > Does my reply address your concerns? > > It does address the immediate concern about MMIO registers, but I > think there is still an open question regarding what the correct > behavior on volatile variables should be in the absence of -msafe-bwa. My understanding of ISO C is that the `volatile' keyword serves as an optimisation barrier for accesses to the object concerned. Therefore the compiler is not allowed to optimise away or merge loads or stores, but it is not required to guarantee atomicity or that extra accesses will not be produced (with LL/SC extra accesses are inevitable anyway). Maciej