Hi Jakub, Torvald,
On 03/06/16 13:32, Jakub Jelinek wrote:
On Fri, Jun 03, 2016 at 02:26:09PM +0200, Torvald Riegel wrote:
And that would be fine, IMO. If you can't even load atomically, doing
something useful with this type will be hard except in special cases.
Also, doing a CAS (compare-and-swap) and thus potentially bringing in
the cache line in exclusive mode can be a lot more costly than what
users might expect from a load. A short critical section might not be
much slower.
If you only have a CAS as base of the atomic operations on a type, then
a CAS operation exposed to the user will still be a just a single HW
CAS. But any other operation besides the CAS and a load will need *two*
CAS operations; even an atomic store has to be implemented as a CAS
loop.
Would we just stop expanding all those __atomic_*/__sync_* builtins inline
then (which would IMHO break tons of stuff), or just some predicate that
atomic.h/atomic headers use?
But doesn't that mean you should fall back to locked operation also for any
other atomic operation on such types, because otherwise if you atomic_store
or any other kind of atomic operation, it wouldn't use the locking, while
for atomic load it would?
I suppose you mean that one must fall back to using locking for all
operations? If load isn't atomic, then it can't be made atomic using
external locks if the other operations don't use the locks.
That would be an ABI change and quite significant
pessimization in many cases.
A change from wide CAS to locking would be an ABI change I suppose, but
it could also be considered a necessary bugfix if we don't want to write
to read-only memory. Does this affect anything but i686?
Also x86_64 (for 128-bit atomics), clearly also either arm or aarch64
(judging from who initiated this thread), I bet there are many others.
I'm looking at pre-LPAE ARMv7-A targets for which the
ARM Architecture Reference Manual (rev C.c) section A3.5.3 recommends:
"The way to atomically load two 32-bit quantities is to perform a
LDREXD/STREXD sequence, reading and writing the same value, for which the
STREXD succeeds, and use the read values."
Currently we emit just a single load-doubleword-exclusive which, according to
the above,
would not be enough on such targets.
On aarch64 doubleword (128 bit) atomic loads are done through locks (PR 70814).
Kyrill
Jakub