On 02/03/2017 05:44 AM, Ramana Radhakrishnan wrote:
On 02/02/17 15:21, Torvald Riegel wrote:
On Thu, 2017-02-02 at 14:48 +0000, Ramana Radhakrishnan wrote:
On 30/01/17 18:54, Torvald Riegel wrote:
This patch fixes the __atomic builtins to not implement supposedly
lock-free atomic loads based on just a compare-and-swap operation.

If there is no hardware-backed atomic load for a certain memory
location, the current implementation can implement the load with a CAS
while claiming that the access is lock-free.  This is a bug in the cases
of volatile atomic loads and atomic loads to read-only-mapped memory; it
also creates a lot of contention in case of concurrent atomic loads,
which results in at least counter-intuitive performance because most
users probably understand "lock-free" to mean hardware-backed (and thus
"fast") instead of just in the progress-criteria sense.

This patch implements option 3b of the choices described here:
https://gcc.gnu.org/ml/gcc/2017-01/msg00167.html


Will Deacon pointed me at this thread asking if something similar could
be done on ARM.

It would be nice if someone more familiar with ARM could double-check
that ARM is not affected.  I guess ARM isn't, but that's based on me
looking at machine descriptions, which I hadn't ever done before working
on this patch...


ARM doesn't have __int128 support, so I don't think the problem exists there.

On ARM, on architecture levels (i.e arch < armv6k) that do not have single copy
atomic routines we end up with calling the kernel helper routines where the
appropriate handling is done by the kernel depending on whether you are
multicore or not.

__atomic_load on ARM appears to be ok as well

except for

__atomic_load_di which should really be the ldrexd / strexd loop but we could
ameliorate that similar to your option 3b.

No, look again. ldrexd has 64-bit single-copy semantics WITHOUT requiring the strexd. It's only the AArch64 (64-bit) LDXP that requires the store.


r~

Reply via email to