On Mon, 18 Nov 2024, Arnd Bergmann wrote:

> >> a) storing an '_Atomic' variable smaller than 8 bytes on non-bwx
> >>    targets should use ll/sc, but uses a plain rmw cycle.
> >
> >  I think there should be no problem with aligned 4-byte (longword) data
> > quantities, given that there are both plain and locked/conditional load 
> > and store machine operations provided by the non-BWX architecture.  Why do 
> > you think otherwise?
> 
> It's quite possible that I misremembered this and it was only
> a problem on 1-byte and 2-byte stores, even for _Atomic.

 Fair enough.

> > This does use an LL/SC sequence.  Original non-BWX sequence uses plain 
> > unaligned load/store operations instead:
> 
> Right, with -msafe-bwa this would be covered, but I'm not sure
> if you'd expect to build all of userspace with that flag as well.

 I think it is a reasonable plan, which I have already considered, but 
chose not to propose updating compiler specs at this time to do this for 
the Alpha/Linux targets by default.

> I can certainly see how one would argue that userspace doesn't
> need to use LL/SC for non-atomic subword access, but at the same

 While POSIX.1-2024 has:

"If the process is multi-threaded, or if the process is single-threaded 
and a signal handler is executed other than as the result of:

"[...]

"the behavior is undefined if:

"* The signal handler refers to any object other than `errno' with static 
   or thread storage duration that is not a lock-free atomic object, and 
   not a non-modifiable object (for example, string literals, objects that 
   were defined with a const-qualified type, and objects in memory that is 
   mapped read-only), other than by assigning a value to an object 
   declared as `volatile sig_atomic_t', unless the previous modification 
   (if any) to the object happens before the signal handler is called and 
   the return from the signal handler happens before the next modification 
   (if any) to the object."

and `sig_atomic_t' is typedef'd to `int' on Alpha/Linux (then again, what 
about object with allocated storage?), I yet need to find a reference on 
what the requirements are for threads, if any.

 It does not appear to me that people are supposed to expect data races 
with accesses to non-atomic objects with static or allocated storage owing 
to another thread accessing an unrelated non-atomic object just because 
they land next to each other at link time, e.g. in BSS, especially if 
those only happen with an obscure old CPU architecture.  Maybe less so 
when it comes to parts of an aggregate, but still I'm not fully convinced 
we should permit this by default.

 Therefore I think long-term it'll make sense to make Alpha/Linux default 
to `-msafe-bwa -msafe-partial', while retaining the current code model for 
use by environments such as bare metal that do not support concurrent or 
parallel execution.

> time I think the RMW sequence on _Atomic variables is a clear
> bug that you'd need to fix also for -mno-safe-bwa.

 That's weird indeed, GCC internals manual clearly says:

"'atomic_storeMODE'
     This pattern implements an atomic store operation with memory model
     semantics.  Operand 0 is the memory address being stored to.
     Operand 1 is the value to be written.  Operand 2 is the memory
     model to be used for the operation.

     If not present, the '__atomic_store' built-in function will attempt
     to perform a normal store and surround it with any required memory
     fences.  If the store would not be atomic, then an
     '__atomic_exchange' is attempted with the result being ignored."

and while we do not have `atomic_storeqi' nor `atomic_storehi' operations 
we do have `atomic_exchangeqi' and `atomic_exchangehi' ones defined in the 
Alpha backend (for byte and word operations respectively).  So presumably 
the middle end is not aware for some reason that on non-BWX Alpha normal 
QI and HI mode stores are not atomic.

 I do hope this obviously obscure but trivial GCC bug was not the *sole* 
reason to drop non-BWX support from Linux.

> >  To avoid triggering side effects Alpha system chipsets define a sparse 
> > I/O space decoding window where data locations are spaced apart such that 
> > no BWX operations are required to read or write individual 8-bit, or 
> > 16-bit, or even 24-bit peripheral registers.  With DEC's own peripheral 
> > bus solutions it may not have been always necessary, but surely it has 
> > been to support PCI with the original Alpha implementation (EV4).  We do 
> > have support for sparse I/O space operations in Linux, we've always had.
> >
> >  Does my reply address your concerns?
> 
> It does address the immediate concern about MMIO registers, but I
> think there is still an open question regarding what the correct
> behavior on volatile variables should be in the absence of -msafe-bwa.

 My understanding of ISO C is that the `volatile' keyword serves as an 
optimisation barrier for accesses to the object concerned.  Therefore the 
compiler is not allowed to optimise away or merge loads or stores, but it 
is not required to guarantee atomicity or that extra accesses will not be 
produced (with LL/SC extra accesses are inevitable anyway).

  Maciej

Reply via email to