On 02/02/17 14:52, Jakub Jelinek wrote:
On Thu, Feb 02, 2017 at 02:48:42PM +0000, Ramana Radhakrishnan wrote:
On 30/01/17 18:54, Torvald Riegel wrote:
This patch fixes the __atomic builtins to not implement supposedly
lock-free atomic loads based on just a compare-and-swap operation.
If there is no hardware-backed atomic load for a certain memory
location, the current implementation can implement the load with a CAS
while claiming that the access is lock-free. This is a bug in the cases
of volatile atomic loads and atomic loads to read-only-mapped memory; it
also creates a lot of contention in case of concurrent atomic loads,
which results in at least counter-intuitive performance because most
users probably understand "lock-free" to mean hardware-backed (and thus
"fast") instead of just in the progress-criteria sense.
This patch implements option 3b of the choices described here:
https://gcc.gnu.org/ml/gcc/2017-01/msg00167.html
Will Deacon pointed me at this thread asking if something similar could be
done on ARM.
On armv8-a we can implement an atomic load of 16 bytes using an LDXP / STXP
loop as a 16 byte load isnt' single copy atomic. On armv8.1-a we do have a
CAS on 16 bytes.
If the AArch64 ISA guarantees LDXP is atomic, then yes, you can do that.
The problem we have on x86_64 is that I think neither Intel nor AMD gave us
guarantees that aligned SSE or AVX loads are guaranteed to be atomic.
LDXP is not single copy atomic.
so this would become something like the following with appropriate
additional barriers.
You need to write this as a loop of
.retry
LDXP x0, x1 , [x2]
STXP X3, X0, X1, [x2]
CBNZ X3, .retry
You have to do the write again to guarantee atomicity on AArch64.
I missed the first paragraph in this thread on gcc@.
https://gcc.gnu.org/ml/gcc/2017-01/msg00167.html
I consider this a bug because it can
result in a store being issued (e.g., when loading from a read-only
page, or when trying to do a volatile atomic load), and because it can
increase contention (which makes atomic loads perform much different
than HW load instructions would). See the thread "GCC libatomic ABI
specification draft" for more background.
Ok, this means implementing such a change in libatomic for AArch64 will
introduce the bug that Torvald is worried about for x86_64 .
regards
Ramana
Jakub