Got an error from gcc@gcc.gnu.org alias. Remove the pdf attachment and re-send it to the alias ...

On 11/14/2016 4:34 PM, Bin Fan wrote:
Hi All,

I have an updated version of libatomic ABI specification draft. Please take a look to see if it matches GCC implementation. The purpose of this document is to establish an official GCC libatomic ABI, and allow compatible compiler and runtime implementations on the affected platforms.

Compared to the last version you have reviewed, here are the major updates

- Rewrite the notes in N2.3.2 to explicit mention the implementation of __atomic_compare_exchange follows memcmp/memcpy semantics, and the consequence of it.

- Rewrite section 3 to replace "lock-free" operations with "hardware backed" instructions. The digest of this section is: 1) inlineable atomics must be implemented with the hardware backed atomic instructions. 2) for non-inlineable atomics, the compiler must generate a runtime call, and the runtime support function is free to use any implementation.

- The Rationale section in section 3 is also revised to remove the mentioning of "lock-free", but there is not major change of concept.

- Add note N3.1 to emphasize the assumption of general hardware supported atomic instruction

- Add note N3.2 to discuss the issues of cmpxchg16b

- Add a paragraph in section 4.1 to specify memory_order_consume must be implemented through memory_order_acquire. Section 4.2 emphasizes it again.

- The specification of each runtime functions mostly maps to the corresponding generic functions in the C11 standard. Two functions are worth noting: 1) C11 atomic_compare_exchange compares and updates the "value" while __atomic_compare_exchange functions in this ABI compare and update the "memory", which implies the memcmp and memcpy semantics. 2) The specification of __atomic_is_lock_free allows both a per-object result and a per-type result. A per-type implementation could pass NULL, or a faked address as the address of the object. A per-object implementation could pass the actual address of the object.

Thanks,
- Bin

On 8/10/2016 3:33 PM, Bin Fan wrote:
Hi Torvald,

Thanks a lot for your review. Please find my response inline...

On 8/5/2016 8:51 AM, Torvald Riegel wrote:
[CC'ing Andrew MacLeod, who has been working on the atomics too.]

On Tue, 2016-08-02 at 16:28 -0700, Bin Fan wrote:
I'm wondering if you have a chance to review the revised libatomic ABI
draft. The email was rejected by the gcc alias once due to some html
stuff in the email text. Though I resend a pure txt format version, I'm
not sure if it worked, so this time I drop the gcc alias.

If you do not have any issues, I'm wondering if this ABI draft could be
published in some GCC wiki or documentation? I'd be happy to prepare a
version without the "notes" part.



Because the padding of structure types is not affected by _Atomic
modifier, the contents of any padding in the atomic structure object
is still undefined, therefore the atomic compare and exchange operation
on such objects may fail due to the difference of the padding.
I think this isn't quite clear.
This paragraph is just to clarify that _Atomic does not change (e.g. zeroing out) the padding bits, whose content were undefined in the current SPARC and x86 ABI specifications, and will
still be undefined for _Atomic aggregates.

This paragraph is part of "notes" rather than the main body of the ABI draft. If it is not clear,
I will change it by mentioning the memcmp/memcpy-like semantics.

Perhaps it's easier to describe it in
the way that C++ does, referring to the memcmp/memcpy-like semantics of
compare_exchange (e.g., see N4606 29.6.5p27).
C11 isn't quite clear about this, or I am misunderstanding what they
really mean by "value of the object" (see N1570 7.17.7.4p2).
This is the subject of C11 Defect Report 431:
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2059.htm#dr_431
which has been fixed to align with the C++ standard and closed with a
Proposed Technical Corrigendum which will appear in the next revision
of the C standard (~2017).

Note that in section 4.2 of this ABI draft, the function description of
__atomic_compare_exchange uses "compares the memory pointed to by object" instead of "compares the value pointed to by object" as you quoted from N1570 7.17.7.4p.

Since you asked about whether you should review the function descriptions, this is one of the two worth noticing cases. I will mention another one later in this email.

Lock-free atomic operations does not require runtime support functions.
The compiler may generate inlined code for efficiency. This ABI
specification defines a few inlineable atomic types. An atomic type
is inlineable means the compiler may generate inlined instruction
sequence for atomic operations on such types. The implementation of
the support functions for the inlineable atomic types must also be
lock free.
I think it's better to say that the support functions must be compatible
with what the compiler would generate.  That they are "lock-free" is
just a forward progress property. This also applies to later paragraphs in the draft. Maybe we need to use a different term here, so we can use
it for what we want (ie, a HW-backed, inlineable operation).
I agree that lock-free atomic operations does not equivalent to HW-backed atomic operations. I will think about how to mention it in the ABI. My current thought is as
you suggested, to change "lock-free" to "HW-backed".

So an example of the updated specification would be like this:
The implementation of the support functions for the inlineable atomic types must use HW-backed atomic instructions. For atomic operations on not inlineable types, the compiler
must always generate support function calls.

On all affected platforms, atomic types whose size equal to 1, 2, 4
or 8 and alignment matches the size are inlineable

On the 64-bit x86 platform which supports the cmpxchg16b instruction,
16-byte atomic types whose alignment matches the size is inlineable.
I still think making 16-byte atomic types inlined / lock-free when all
we have is a wide cmpxchg is wrong.  AFAIK there is no atomic 16-byte
load instruction on x86 (or is there?), even though cmpxchg16b might be
available.
At least GCC 6.1.0 still generates cmpxchg16b for an atomic load with -march=native
on my haswell machine.
I'd prefer if we could fix this in GCC in some way instead
of requiring this by putting it into the ABI.  This also applies to the
double-wide CAS on i386.
IIRC, there is a BZ about this somewhere, but I don't find it.
Andrew, do you remember?

Basically, there is a correctness and a performance problem.
The atomic variable might be in a read-only-mapped page, which isn't
unreasonable given that the C/C++ standards explicitly require lock-free
atomics to be address-free too, which is a clear hint towards enabling
mapping memory to more than one place in the address space. So, if the
user does an atomic load on a 16-byte variable accessible through a
read-only page, we'll get a segfault.
One could argue that C/C++ don't provide any mmap feature, and thus you
can't expect this to work.  But this doesn't seem a good argument to
make from a user's perspective.

Second, I'd argue that the "lock-free" property is used by most users as
an indication of which atomics might be as fast as one would expect
typical HW to be -- not because they are interested in the forward
progress aspect or the address-free aspect.  If atomic loads do cause
writes, the performance of a load will be horrible because of the
contention in cases where many threads issue loads.
If the 16-byte atomic read is implemented in software, the current implementation still uses a lock/mutex, meaning a write will happen somewhere, maybe not directly on the object memory but on somewhere else(a spinlock or a mutex). It can resolve the read-only issue you mentioned, because the write is on the lock rather than on the
object, But there would still be the performance issue of contention.

There are some advanced software algorithms that can make this
most-reader-occational-writer scenario more efficient. (For example, seqlock mentioned
in here: http://www.hpl.hp.com/techreports/2012/HPL-2012-68.pdf)
The performance of such algorithms would depend highly on the use cases, so maybe the user should implement their own algorithm instead of relying on the compiler/libatomic
library to provide the best performance in all cases.
This is even more
unfortunate considering that if one has a 64b CAS, then one can
increment a 64b counter which can be considered to never overflow, which
allows one to build efficient atomic snapshots of larger atomic
variables.
OTOH, some people would like to use the GCC builtins to get access to
cmpxchg16b.

Irrespective of how we deal with this, we should at least document the
current state and the problems associated with it.  Maybe we should
consider providing separate builtins for cmpxchg16b.
I'm OK with the current GCC implementation, which I believe matches the ABI draft. And
we can document the current issues as appendix or whatever.
If GCC is willing to change, I'm also OK with specifying that 16-byte atomic types are
not inlineable.

"Inlineability" is a compile time property, which in most cases depends
only on the type. In a few cases it also depends on whether the target
ISA supports the cmpxchg16b instruction. A compiler may get the ISA
information by either compilation flags or inquiring the hardware
capabilities. When the hardware capabilities information is not available, the compiler should assume the cmpxchg16b instruction is not supported.
I think that strictly speaking, it always depends on the target ISA,
because we assume that it provides 1-byte atomic operations, for
example.
Right. The ABI specification itself is ISA-specific. For example, if we call it SPARC V9 ABI amendment, then it is safe to assume that the ISA support 1,2,4,8 -byte atomic hardware instructions, then it is safe to make such specification of "inlineable" in the ABI.

I'm not very familiar with x86 ISA versioning. I used to assume cmpxchg16b is available on all today's mainstream x86 platforms until I found Xeon Phi does not support it. That's
why the ABI says it depends on target ISA.

    memory_order_consume = 1,
[...]
Refer to C standard for the meaning of each enumeration constants of
memory_order type.
[...]
Most of the functions listed in this section can be mapped to the generic
functions with the same semantics in the C standard. Refer to the C
standard for the description of the generic functions and how each memory
order works.
We need to say that memory_order_consume must be implemented through
memory_order_acquire.  The compiler can't preserve dependencies
correctly and will never be able to for the current specification of
consume.  Thus, we must fall back to acquire MO.
As far as I can tell, neither SPARC or x86 has instructions that may benefit from the consume
order. So I'm happy to make this change.

I haven't looked at the descriptions of the individual atomic operations
in detail.  Let me know if I should.
In the above I mentioned there may be two places in the descriptions that may be interesting. I have mentioned one in the above (__atomic_compare_exchange). The other one is
__atomic_is_lock_free. This is based on Richard's comments.

Thanks again for your review, I will send a new draft based on your comments. Please send me
any further comments/suggestions.

Thanks,
- Bin


Torvald



LIBATOMIC ABI SPECIFICATION DRAFT


1. Overview


1.1. Why we need an ABI for atomics


C11 standard allows different size, representation and alignment between atomic 
types and the corresponding non-atomic types [1]. The size, representation and 
alignment of atomic types need to be specified in the ABI specification.


A runtime support library, libatomic, already exists on Solaris and Linux. The 
interface of this library needs to be standardized as part of the ABI 
specification, so that


- On a system that supply libatomic, all compilers in compliance with the ABI 
can generate compatible binaries linking this library.
- The binaries can be backward compatible on different versions of the system 
as long as they support the same ABI.


1.2. What does the atomics ABI specify


The ABI specifies the following


- Data representation of the atomic types.
- The names and behaviour of the implementation-specific support functions.
- The versioning of the library external symbols
- The atomic types for which the compiler may generate inlined code.
- compatibility requirement for the inlined atomic operations.


Note that the libatomic functions specified in the C Standard are not part of 
this ABI, because they are not implementation-specific functions. 


1.3. Platforms affected by this ABI specification


SPARC (32-bit and 64-bit)
x86 (32-bit and 64-bit)


It is assume that 64-bit SPARC platform only implement TSO (Total Store Order) 
memory model.


Section 1.1 and 1.2, and the Rationale, Notes and Appendix sections are for 
explanation purpose only, it is not part of the formal ABI specification.


Notes


N1.3.1. Some 64-bit x86 platforms, such as some early AMD64 processors and the 
more modern Intel Xeon Phi co-processor do not support the cmpxchg16b 
instruction. We will discuss in detail about cmpxchg16b in Section 3.


2. Data Representation


2.1. General Rules


The general rules of the size, representation and alignment of atomic types’ 
data representation are the following 


1) Atomic types assume the same size with the corresponding non-atomic types. 


2) Atomic types assume the same representation with the corresponding 
non-atomic types. 


3) Atomic types assume the same alignment with the corresponding non-atomic 
types, with the following exceptions:


On 32- and 64-bit x86 platforms and on 64-bit SPARC platforms, atomic types of 
size 1, 2, 4, 8 or 16 -byte have the alignment that matches the size. 


On 32-bit SPARC platforms, atomic types of size 1, 2, 4 or 8-byte have the 
alignment that matches the size. If the alignment of a 16-byte non-atomic type 
is less than 8-byte, the alignment of the corresponding atomic type is raised 
to 8-byte.


Notes


N2.1.1. The above rules are applied to both scalar types and aggregate types.


2.2. Atomic scalar types


x86


                                          LP64 (AMD64)                     
ILP32 (i386)
C Type                          sizeof    Alignment  Inlineable  sizeof    
Alignment  Inlineable
atomic_flag                     1         1          Y           1         1    
      Y
_Atomic _Bool                   1         1          Y           1         1    
      Y
_Atomic char                    1         1          Y           1         1    
      Y
_Atomic signed char             1         1          Y           1         1    
      Y
_Atomic unsigned char           1         1          Y           1         1    
      Y
_Atomic short                   2         2          Y           2         2    
      Y
_Atomic signed short            2         2          Y           2         2    
      Y
_Atomic unsigned short          2         2          Y           2         2    
      Y
_Atomic int                     4         4          Y           4         4    
      Y
_Atomic signed int              4         4          Y           4         4    
      Y
_Atomic enum                    4         4          Y           4         4    
      Y
_Atomic unsigned int            4         4          Y           4         4    
      Y
_Atomic long                    8         8          Y           4         4    
      Y
_Atomic signed long             8         8          Y           4         4    
      Y
_Atomic unsigned long           8         8          Y           4         4    
      Y
_Atomic long long               8         8          Y           8         8    
      Y
_Atomic signed long long        8         8          Y           8         8    
      Y
_Atomic unsigned long long      8         8          Y           8         8    
      Y
_Atomic __int128 (with at16)    16        16         Y               not 
applicable
_Atomic __int128 (w/o at16)     16        16         N               not 
applicable
any-type _Atomic *              8         8          Y           4         4    
      Y
_Atomic float                   4         4          Y           4         4    
      Y
_Atomic double                  8         8          Y           8         8    
      Y
_Atomic long double (with at16) 16        16         Y           12        4    
      N
_Atomic long double (w/o at16)  16        16         N           12        4    
      N
_Atomic float _Complex          8         8(4)       Y           8         8(4) 
      Y
_Atomic double _Complex         16        16(8)      Y           16        
16(8)      N
                    (with at16)
_Atomic double _Complex         16        16(8)      N           16        
16(8)      N
                    (w/o at16)
_Atomic long double _Complex    32        16         N           24        4    
      N
_Atomic float _Imaginary        4         4          Y           4         4    
      Y
_Atomic double _Imaginary       8         8          Y           8         8    
      Y
_Atomic long double _Imaginary  16        16         Y           12        4    
      N
                    (with at16)
_Atomic long double _Imaginary  16        16         N           12        4    
      N
                    (w/o at16)


SPARC


                                          LP64 (v9)                        
ILP32 (sparc)
C Type                          sizeof    Alignment  Inlineable  sizeof    
Alignment  Inlineable
atomic_flag                     1         1          Y           1         1    
      Y
_Atomic _Bool                   1         1          Y           1         1    
      Y
_Atomic char                    1         1          Y           1         1    
      Y
_Atomic signed char             1         1          Y           1         1    
      Y
_Atomic unsigned char           1         1          Y           1         1    
      Y
_Atomic short                   2         2          Y           2         2    
      Y
_Atomic signed short            2         2          Y           2         2    
      Y
_Atomic unsigned short          2         2          Y           2         2    
      Y
_Atomic int                     4         4          Y           4         4    
      Y
_Atomic signed int              4         4          Y           4         4    
      Y
_Atomic enum                    4         4          Y           4         4    
      Y
_Atomic unsigned int            4         4          Y           4         4    
      Y
_Atomic long                    8         8          Y           4         4    
      Y
_Atomic signed long             8         8          Y           4         4    
      Y
_Atomic unsigned long           8         8          Y           4         4    
      Y
_Atomic long long               8         8          Y           8         8    
      Y
_Atomic signed long long        8         8          Y           8         8    
      Y
_Atomic unsigned long long      8         8          Y           8         8    
      Y
_Atomic __int128                16            16         N               not 
applicable
any-type _Atomic *              8         8          Y           4         4    
      Y
_Atomic float                   4         4          Y           4         4    
      Y
_Atomic double                  8         8          Y           8         8    
      Y
_Atomic long double             16        16         N           16        8    
      N
_Atomic float _Complex          8         8(4)       Y           8         8(4) 
      Y
_Atomic double _Complex         16        16(8)      N           16        8    
      N
_Atomic long double _Complex    32        16         N           32        8    
      N
_Atomic float _Imaginary        4         4          Y           4         4    
      Y
_Atomic double _Imaginary       8         8          Y           8         8    
      Y
_Atomic long double _Imaginary  16        16         N           16        8    
      N


with at16 means the ISA supports cmpxchg16b, w/o at16 means the ISA
does not support cmpxchg16b.


Notes


N2.2.1. C standard also specifies some atomic integer types. They are not in 
the above table because they have the same representation and alignment 
requirements as the corresponding direct types [2].


N2.2.2. We will discuss the inlineable column and __int128 type in section 3.


N2.2.3. The value in parenthesis is the alignment of the corresponding 
non-atomic type, if it is different from the alignment of the atomic type.


N2.2.4. Because _Atomic specifier can not be used on a function type [7] and 
_Atomic qualifier can not modify a function type [8], there is no atomic 
function type listed in the above table.


N2.2.5. On 32-bit x86 platforms, long double is of size 12-byte and is of 
alignment 4-byte. This ABI specification does not increase the alignment of 
_Atomic long double type.


2.3 Atomic Aggregates and Unions


Atomic structures or unions may have different alignment compared to the 
corresponding non-atomic types, subject to rule 3) in section 2.1. The 
alignment change only affects the boundary where an entire structure or union 
is aligned. The offset of each member, the internal padding and the size of the 
structure or union are not affected.


The following table shows selected examples of the size and alignment of atomic 
structure types.


x86


                                          LP64 (AMD64)                      
ILP32 (i386)
C Type                          sizeof    Alignment  Inlineable   sizeof    
Alignment  Inlineable
_Atomic struct {char a[2];}     2         2(1)       Y            2         
2(1)       Y
_Atomic struct {char a[3];}     3         1          N            3         1   
       N
_Atomic struct {short a[2];}    4         4(2)       Y            4         
4(2)       Y
_Atomic struct {int a[2];}      8         8(4)       Y            8         
8(4)       Y
_Atomic struct {char c;
                int i;}         8         8(4)       Y            8         
8(4)       Y
_Atomic struct {char c[2];
                short s;
                int i;}         8         8(4)       Y            8         
8(4)       Y
_Atomic struct {char a[16];}    16        16(1)      Y            16        
16(1)      N
                    (with at16)
_Atomic struct {char a[16];}    16        16(1)      N            16        
16(1)      N
                    (w/o at16)


SPARC


                                          LP64 (v9)                       ILP32 
(sparc)
C Type                          sizeof    Alignment  Inlineable   sizeof    
Alignment  Inlineable
_Atomic struct {char a[2];}     2         2(1)       Y            2         
2(1)       Y
_Atomic struct {char a[3];}     3         1          N            3         1   
       N
_Atomic struct {short a[2];}    4         4(2)       Y            4         
4(2)       Y
_Atomic struct {int a[2];}      8         8(4)       Y            8         
8(4)       Y
_Atomic struct {char c;
                int i;}         8         8(4)       Y            8         
8(4)       Y
_Atomic struct {char c[2];
                short s;
                int i;}         8         8(4)       Y            8         
8(4)       Y
_Atomic struct {char a[16];}    16        16(1)      N            16        
8(1)       N


with at16 means the ISA supports cmpxchg16b, w/o at16 means the ISA does not 
support cmpxchg16b.


Notes


N2.3.1. The value in parenthesis is the alignment of the corresponding 
non-atomic type, if it is different from the alignment of the atomic type.


N2.3.2. For aggregates that are not modified by _Atomic, the contents of the 
padding bits are undefined. For _Atomic aggregates, the contents of the padding 
bits are also undefined. The implementation of __atomic_compare_exchange 
follows the memcmp/memcpy semantics, which may result in unsuccessful 
comparisons due to the undefined contents of the padding bits. C11 is not clear 
about this. DR 431 [9] raised this issue, which has been fixed and will appear 
in the next revision of the C standard (~2017).


N2.3.3. The special alignment requirement on 16-byte atomic struct types might 
be useful for the following:
- Reducing sharing locks with other atomics.
- Allowing related runtime support functions to choose more efficient 
instructions.


2.4. Bit-fields


It is implementation defined in the C standard that whether atomic bit-field 
types are permitted [3]. In this ABI specification, The representation of 
atomic bit-field is unspecified.


3. Inlineable Property


Some atomic operation can map directly to hardware backed atomic instructions. 
To implement an atomic operation, the compiler may generate inlined code using 
such instructions, or a support function call. This ABI specification defines a 
few inlineable atomic types. The specification of the inlineable attribute is 
the following:


1. The compiler may generate inlined hardware backed atomic instructions for 
atomic operations on an object of inlineable atomic type. The compiler is also 
allowed to generate a support function call.


2. The implementation of the support functions for an inlineable atomic type 
must use hardware backed atomic instructions to be compatible with the inlined 
code the compiler may generate.


3. If an atomic type is not inlineable, the compiler shall always generate 
support function calls for atomic operations on the objects of the type. The 
implementation of the support functions for the type is free to use hardware 
backed atomic instructions or any other approaches.


On all affected platforms, if the size of an atomic type is 1, 2, 4 or 8 -byte 
and its alignment matches the size, then the atomic type is inlineable.


On the 64-bit x86 platform which supports the cmpxchg16b instruction, if the 
size of an atomic type is 16-byte and its alignment matches the size, then the 
atomic type is inlineable (see notes in this section for some caveats about 
this).


Rationale


It is assumed that an atomic object must be accessed by compatible instructions 
to achieve atomicity. For example, a C atomic_compare_exchange operation may be 
implemented by the hardware compare-and-swap instruction, or by doing the 
compare and the swap in two separated steps protected by a software lock. The 
two implementations are not compatible because the software lock used by thread 
T2 is not visible to thread T1’s hardware compare-and-swap instruction 
therefore the swap may happen while thread T1 is holding the lock. So the two 
implementations should not be used to access the same object at the same time 
in a run of the program.


If the compiler always generates support function calls for all atomic 
operations, the aforementioned compatibility problem would never happen. But 
the compiler should be allowed, yet not be forced, to generate inlined code for 
some atomic operations for better performance. It should be guaranteed that 
if/when the compiler generates inlined code, it must be compatible with the 
library implementation.


So this ABI specifies a few inlineable atomic types, for which the compiler may 
generate inlined code, and both the inlined code and the implementation of the 
corresponding support functions must use hardware backed atomic instructions. 


Two alternatives considered


1. To specify a type based criteria, and for all types that meet the criteria, 
both compiler and support function must use hardware backed atomic 
instructions; and for all types that do not meet the criteria, neither the 
compiler nor the support function may use hardware backed atomic instructions.
 
The C and C++ standard seem to back this approach: C++ standard provides a 
query that returns a per-type result about whether the type is lock-free [4]. C 
standard does not guarantee that the query result is per-type [5], but it will 
be in the next revision [6]. The problem is that the query result does not 
necessarily reflect the implementation of the atomic operation on the
queried type. Even is_lock_free returns false for an object because of its 
type, the implementation may still use hardware backed atomic instructions for 
the object. Say there is a size=3-byte alignment=1-byte atomic type. This type 
can not always use hardware backed atomic instruction because of its alignment, 
but it can when the runtime address happens to be a 4-byte aligned. So this 
approach is unnecessarily conservative.


The ABI differs from this alternative in that the ABI allows the runtime 
implementation for a non-inlineable atomic type to use hardware backed atomic 
instructions.


2. To specify an object based criteria, and if an atomic object meets the 
criteria, both the compiler and the support functions must use hardware backed 
atomic instructions; otherwise, neither the compiler nor the support function 
may use the hardware backed atomic instructions. 


The criteria would be based on some runtime information, such the alignment of 
the object’s address, which would be difficult for the compiler to get at the 
compile time. It would be much easier for the runtime to do such optimization, 
and let the compiler always generates calls for such type of objects.


Notes:


N3.1. This ABI assumes 1, 2, 4, 8 -byte hardware atomic instructions are 
available on all relevant platforms. This means for objects of those sizes, 
naturally aligned load and store instructions are guaranteed to be atomic, and 
variants of atomic compare-and-swap instructions are available as well.


N3.2. About cmpxchg16b


This ABI document specifies that if cmpxchg16b is supported on a 64-bit x86 
platform, then 16-byte properly aligned atomics are inlineable on the platform. 


The only available instruction on such platforms to implement atomic load, 
store, exchange and compare_exchange operations is cmpxchg16b. One could argue 
that xmm registers can be used to do a 16-byte memory move, but it is not 
guaranteed to be atomic in the current Intel manual [12]. This causes the 
following caveats to implement the current ABI specification


1. cmpxchg16b performs a write on the affected memory location. If the atomic 
variable is in a read-only mapped page, then using cmpxchg16b to do the load 
will cause a segfault. One could argue that mmap is not part of C/C++ 
specification. But some notes in C/C++ specifications imply the mmap semantics. 
C11 explicitly mentions lock-free atomic operations should be address-free. The 
same memory location could be mapped to two different addresses, and atomic 
operations on this location should still communicate atomically [10]. Similar 
note can also be found  in C++11 [11]. 


2. Using cmpxchg16b may not give the atomic_load on a GCC _Atomic __int128 
object the expected performance. One would expect that in the non-contention 
scenario, the hardware-backed atomic load implementation will run in full speed 
just like the 1,2,4,8-byte atomic_loads. However, the write operation on the 
affected memory location will effectively make the read-only scenario a high 
contention scenario, significantly slowing down the performance. One might 
argue that a software lock implementation is not any better because the lock 
implementation will probably still perform a write or a compare-and-swap 
operation anyway. But a runtime implementation could also choose a more 
flexible implementation, such as seqlock [13], to make the most-reader scenario 
more efficient. Or if the runtime just expose cmpxchg16 as an intrinsic, an 
expert user can build his/her own implementation. 


Although this ABI specification specifies that 16-byte properly aligned atomics 
are inlineable on platforms supporting cmpxchg16b, we document the caveats here 
for further discussion. If we decide to change the inlineable attribute for 
those atomics, then this ABI, the compiler and the runtime implementation 
should be updated together at the same time.


The compiler and runtime need to check the availability of cmpxchg16b to 
implement this ABI specification. Here is how it would work: The compiler can 
get the information either from the compiler flags or by inquiring the hardware 
capabilities. When the information is not available, the compiler should assume 
that cmpxchg16b instruction is not supported. The runtime library 
implementation can also query the hardware compatibility and choose the 
implementation at runtime. Assuming the user provides correct compiler options 
and the inquiry returns the correct information, on a platform that supports 
cmpxchg16b, the code generated by the compiler will both use cmpxchg16b; on a 
platform that does not support cmpxchg16b, the code generated by the compiler, 
including the code generated for a generic platform, always call the support 
function, so there is no compatibility problem. 


N3.3. Here are a few examples of small types which don't qualify as inlineable:


  _Atomic struct {char a[3];} /* size = 3, alignment = 1 */
  _Atomic long double /* (on 32-bit x86) size = 12, alignment = 4 */


A smart compiler may know such an object is located at an address that fits in 
an 8-byte aligned window, but the ABI does not allow the compiler to generate 
inlined code sequence using hardware backed atomic instructions. This is 
because another compiler, or the same compiler with a different optimization 
level may generate a support function call, and the support function 
implementation is not required to use compatible instructions.


4. libatomic library functions


4.1. Data Definitions


This section contains examples of system header files that provide data 
interface needed by the libatomic functions.


<stdatomic.h>


typedef enum
{
    memory_order_relaxed = 0,
    memory_order_consume = 1,
    memory_order_acquire = 2,
    memory_order_release = 3,
    memory_order_acq_rel = 4,
    memory_order_seq_cst = 5
} memory_order;


typedef _Atomic struct
{
  unsigned char __flag;
} atomic_flag;


Refer to C standard for the meaning of each enumeration constants of
memory_order type.


memory_order_consume must be implemented through memory_order_acquire.


Notes
N4.1.1. All affected platforms of this ABI specification implement a strong 
memory model on which memory_order_consume does not provide any benefit over 
memory_order_acquire. Therefore this ABI specifies that memory_order_consume is 
raised to memory_order_acquire. 


<fenv.h>


SPARC


#define FE_INEXACT    0x01
#define FE_DIVBYZERO  0x02
#define FE_UNDERFLOW  0x04
#define FE_OVERFLOW   0x08
#define FE_INVALID    0x10


x86


#define FE_INVALID    0x01
#define FE_DIVBYZERO  0x04
#define FE_OVERFLOW   0x08
#define FE_UNDERFLOW  0x10
#define FE_INEXACT    0x20


4.2. Support Functions


The following kinds of atomic operations are supported by the runtime library: 
load, store, exchange, compare-and-exchange and arithmetic read-modify-write 
operations. For the arithmetic read-modify-write operations, the following 
kinds of modification operation are supported: addition, subtraction, bitwise 
inclusive or, bitwise exclusive or, bitwise and, bitwise nand. There are also 
test-and-set functions.


For each kind of atomic operations, libatomic provide a generic version which 
accepts a pointer of all atomic types and some size specific functions. The 
size specific versions pass and return data by value, the generic version pass 
and return data via pointers.


Most of the functions listed in this section can be mapped to the corresponding 
generic functions in the C11. Refer to the C11 Standard for the description of 
the generic functions and how each memory order works. Note that 
memory_order_consume must be implemented through memory_order_acquire.


The following functions are available on all platforms.


void __atomic_load (size_t size, void *object, void *loaded, memory_order 
order);


Atomically load the value pointed to by object. Assign the loaded value to the 
memory pointed to by loaded. The size of memory affected by the load is 
designated by size.


int8_t __atomic_load_1 (int8_t *object, memory_order order);
int16_t __atomic_load_2 (int16_t *object, memory_order order);
int32_t __atomic_load_4 (int32_t *object, memory_order order);
int64_t __atomic_load_8 (int64_t *object, memory_order order);


Atomically load the value pointed to by object. The loaded value is returned. 
The size of memory affected by the load is designated by the type of the 
object. If object is not aligned properly according to the type of object, the 
behavior is undefined. 


Memory is affected according to the value of order. If order is either 
memory_order_release or memory_order_acq_rel, the behavior of the function is 
undefined.


void __atomic_store (size_t size, void *object, void *desired, memory_order 
order)


Atomically replace the value pointed to by object with the value pointed to by 
desired. The size of memory affected by the store is designated by size.


void __atomic_store_1 (int8_t *object, int8_t desired, memory_order order);
void __atomic_store_2 (int16_t *object, int16_t desired, memory_order order);
void __atomic_store_4 (int32_t *object, int32_t desired, memory_order order);
void __atomic_store_8 (int64_t *object, int64_t desired, memory_order order);


Atomically replace the value pointed to by object with desired. The size of 
memory affected by the store is designated by the type of the object. If object 
is not aligned properly according to the type of object, the behavior is 
undefined.


Memory is affected according to the value of order. If order is one of 
memory_order_acquire, memory_order_consume or memory_order_acq_rel, the 
behavior of the function is undefined.


void __atomic_exchange (size_t size, void *object, void *desired, void *loaded, 
memory_model order);


Atomically, replace the value pointed to by object with the value pointed to by 
desired and assign the value pointed to by loaded to the value pointed to by 
object immediately before the effect. The size of memory affected by the 
exchange is designated by size.


int8_t __atomic_exchange_1 (int8_t * object, int8_t desired, memory_order)
int16_t __atomic_exchange_2 (int16_t * object, int16_t desired, memory_order)
int32_t __atomic_exchange_4 (int32_t * object, int32_t desired, memory_order)
int64_t __atomic_exchange_8 (int64_t * object, int64_t desired, memory_order)


Atomically, replace the value pointed to by object with desired and return the 
value pointed to by object immediately before the effect. The size of memory 
affected by the exchange is designated by the type of object. If object is not 
aligned properly according to the type of object, the behavior is undefined.


Memory is affected according to the value of order.


_Bool __atomic_compare_exchange (size_t size, void *object, void *expected, 
void *desired, memory_model success_order, memory_model failure_order);


Atomically, compares the memory pointed to by object for equality with the 
memory pointed to by expected, and if true, replaces the memory pointed to by 
object with the memory pointed to by desired, and if false, updates the memory 
pointed to by expected with the memory pointed to by object. The result of the 
comparison is returned. The size of memory affected by the compare and exchange 
is designated by size.


The compare and exchange never fail spuriously, i.e. if the comparison for 
equality returns false, the two values in the comparison were not equal. [Note, 
this is to specify that on SPARC and x86, compare exchange is always 
implemented with "strong" semantic. The weak flavors in the C standard is 
translated to strong.]


_Bool __atomic_compare_exchange_1 (int8_t *object, int8_t *expected, int8_t 
desired, memory_order success_order, memory_order failure_order);
_Bool __atomic_compare_exchange_2 (int16_t *object, int16_t *expected, int16_t 
desired, memory_order success_order, memory_order failure_order);
_Bool __atomic_compare_exchange_4 (int32_t *object, int32_t *expected, int32_t 
desired, memory_order success_order, memory_order failure_order);
_Bool __atomic_compare_exchange_8 (int64_t *object, int64_t *expected, int64_t 
desired, memory_order success_order, memory_order failure_order);


Atomically, compares the memory pointed to by object for equality with the 
memory pointed to by expected, and if true, replaces the memory pointed to by 
object with desired, and if false, updates the memory pointed to by expected 
with the memory pointed to by object. The result of the comparison is returned.


The size of memory affected by the compare and exchange is designated by the 
type of object. If object is not aligned properly according to the type of 
object, the behavior is undefined.


The compare and exchange never fail spuriously, i.e. if the comparison for 
equality returns false, the two values in the comparison were not equal.


If the comparison is true, memory is affected according to the value of 
success_order, and if the comparison is false, memory is affected according to 
the value of failure_order. The result of the comparison is returned.


int8_t __atomic_add_fetch_1 (int8_t *object, int8_t operand, memory_order 
order);
int16_t __atomic_add_fetch_2 (int16_t *object, int16_t operand, memory_order 
order);
int32_t __atomic_add_fetch_4 (int32_t *object, int32_t operand, memory_order 
order);
int64_t __atomic_add_fetch_8 (int64_t *object, int64_t operand, memory_order 
order);


Atomically replaces the value pointed to by object with the result of the value 
pointed to by object plus operand and returns the value pointed to by object 
immediately after the effects. If object is not aligned properly according to 
the type of object, the behavior is undefined. The size of memory affected by 
the effects is designated by the type of object.


int8_t __atomic_fetch_add_1 (int8_t *object, int8_t operand, memory_order 
order);
int16_t __atomic_fetch_add_2 (int16_t *object, int16_t operand, memory_order 
order);
int32_t __atomic_fetch_add_4 (int32_t *object, int32_t operand, memory_order 
order);
int64_t __atomic_fetch_add_8 (int64_t *object, int64_t operand, memory_order 
order);


Atomically replaces the value pointed to by object with the result of the value 
pointed to by object plus operand and returns the value pointed to by object 
immediately before the effects. If object is not aligned properly according to 
the type of object, the behavior is undefined. The size of memory affected by 
the effects is designated by the type of object.


Memory is affected according to the value of order.


int8_t __atomic_sub_fetch_1 (int8_t *object, int8_t operand, memory_order 
order);
int16_t __atomic_sub_fetch_2 (int16_t *object, int16_t operand, memory_order 
order);
int32_t __atomic_sub_fetch_4 (int32_t *object, int32_t operand, memory_order 
order);
int64_t __atomic_sub_fetch_8 (int64_t *object, int64_t operand, memory_order 
order);


Atomically replaces the value pointed to by object with the result of the value 
pointed to by object minus operand and returns the value pointed to by object 
immediately after the effects. If object is not aligned properly according to 
the type of object, the behavior is undefined. The size of memory affected by 
the effects is designated by the type of object.


int8_t __atomic_fetch_sub_1 (int8_t *object, int8_t operand, memory_order 
order);
int16_t __atomic_fetch_sub_2 (int16_t *object, int16_t operand, memory_order 
order);
int32_t __atomic_fetch_sub_4 (int32_t *object, int32_t operand, memory_order 
order);
int64_t __atomic_fetch_sub_8 (int64_t *object, int64_t operand, memory_order 
order);


Atomically replaces the value pointed to by object with the result of the value 
pointed to by object minus operand and returns the value pointed to by object 
immediately before the effects. If object is not aligned properly according to 
the type of object, the behavior is undefined.  The size of memory affected by 
the effects is designated by the type of object.


Memory is affected according to the value of order.


int8_t __atomic_and_fetch_1 (int8_t *object, int8_t operand, memory_order 
order);
int16_t __atomic_and_fetch_2 (int16_t *object, int16_t operand, memory_order 
order);
int32_t __atomic_and_fetch_4 (int32_t *object, int32_t operand, memory_order 
order);
int64_t __atomic_and_fetch_8 (int64_t *object, int64_t operand, memory_order 
order);


Atomically, replaces the value pointed to by object with the result of bitwise 
and of the value pointed to by object and operand and returns the value pointed 
to by object immediately after the effects. If object is not aligned properly 
according to the type of object, the behavior is undefined.  The size of memory 
affected by the effects is designated by the type of object.


int8_t __atomic_fetch_and_1 (int8_t *object, int8_t operand, memory_order 
order);
int16_t __atomic_fetch_and_2 (int16_t *object, int16_t operand, memory_order 
order);
int32_t __atomic_fetch_and_4 (int32_t *object, int32_t operand, memory_order 
order);
int64_t __atomic_fetch_and_8 (int64_t *object, int64_t operand, memory_order 
order);


Atomically, replaces the value pointed to by object with the result of bitwise 
and of the value pointed to by object and operand and returns the value pointed 
to by object immediately before the effects. If object is not aligned properly 
according to the type of object, the behavior is undefined. The size of memory 
affected by the effects is designated by the type of object.


Memory is affected according to the value of order.


int8_t __atomic_or_fetch_1 (int8_t *object, int8_t operand, memory_order order);
int16_t __atomic_or_fetch_2 (int16_t *object, int16_t operand, memory_order 
order);
int32_t __atomic_or_fetch_4 (int32_t *object, int32_t operand, memory_order 
order);
int64_t __atomic_or_fetch_8 (int64_t *object, int64_t operand, memory_order 
order);


Atomically, replaces the value pointed to by object with the result of bitwise 
or of the value pointed to by object and operand and returns the value pointed 
to by object immediately after the effects. If object is not aligned properly 
according to the type of object, the behavior is undefined. The size of memory 
affected by the effects is designated by the type of object.


int8_t __atomic_fetch_or_1 (int8_t *object, int8_t operand, memory_order order);
int16_t __atomic_fetch_or_2 (int16_t *object, int16_t operand, memory_order 
order);
int32_t __atomic_fetch_or_4 (int32_t *object, int32_t operand, memory_order 
order);
int64_t __atomic_fetch_or_8 (int64_t *object, int64_t operand, memory_order 
order);


Atomically, replaces the value pointed to by object with the result of bitwise 
or of the value pointed to by object and operand and returns the value pointed 
to by object immediately before the effects. If object is not aligned properly 
according to the type of object, the behavior is undefined. The size of memory 
affected by the effects is designated by the type of object.


Memory is affected according to the value of order.


int8_t __atomic_xor_fetch_1 (int8_t *object, int8_t operand, memory_order 
order);
int16_t __atomic_xor_fetch_2 (int16_t *object, int16_t operand, memory_order 
order);
int32_t __atomic_xor_fetch_4 (int32_t *object, int32_t operand, memory_order 
order);
int64_t __atomic_xor_fetch_8 (int64_t *object, int64_t operand, memory_order 
order);


Atomically, replaces the value pointed to by object with the result of bitwise 
xor of the value pointed to by object and operand and returns the value pointed 
to by object immediately after the effects. If object is not aligned properly 
according to the type of object, the behavior is undefined. The size of memory 
affected by the effects is designated by the type of object.


int8_t __atomic_fetch_xor_1 (int8_t *object, int8_t operand, memory_order 
order);
int16_t __atomic_fetch_xor_2 (int16_t *object, int16_t operand, memory_order 
order);
int32_t __atomic_fetch_xor_4 (int32_t *object, int32_t operand, memory_order 
order);
int64_t __atomic_fetch_xor_8 (int64_t *object, int64_t operand, memory_order 
order);


Atomically, replaces the value pointed to by object with the result of bitwise 
xor of the value pointed to by object and operand and returns the value pointed 
to by object immediately before the effects. If object is not aligned properly 
according to the type of object, the behavior is undefined. The size of memory 
affected by the effects is designated by the type of object.


Memory is affected according to the value of order.


int8_t __atomic_nand_fetch_1 (int8_t *object, int8_t operand, memory_order 
order);
int16_t __atomic_nand_fetch_2 (int16_t *object, int16_t operand, memory_order 
order);
int32_t __atomic_nand_fetch_4 (int32_t *object, int32_t operand, memory_order 
order);
int64_t __atomic_nand_fetch_8 (int64_t *object, int64_t operand, memory_order 
order);


Atomically, replaces the value pointed to by object with the result of bitwise 
nand of the value pointed to by object and operand and returns the value 
pointed to by object immediately after the effects. If object is not aligned 
properly according to the type of object, the behavior is undefined. The size 
of memory affected by the effects is designated by the type of object.


Bitwise operator nand is defined as the following using ANSI C operators: a 
nand b is equivalent to ~(a & b).


int8_t __atomic_fetch_nand_1 (int8_t *object, int8_t operand, memory_order 
order);
int16_t __atomic_fetch_nand_2 (int16_t *object, int16_t operand, memory_order 
order);
int32_t __atomic_fetch_nand_4 (int32_t *object, int32_t operand, memory_order 
order);
int64_t __atomic_fetch_nand_8 (int64_t *object, int64_t operand, memory_order 
order);


Atomically, replaces the value pointed to by object with the result of bitwise 
nand of the value pointed to by object and operand and returns the value 
pointed to by object immediately before the effects. If object is not aligned 
properly according to the type of object, the behavior is undefined. The size 
of memory affected by the effects is designated by the type of object.


Bitwise operator nand is defined as the following using ANSI C operators: a 
nand b is equivalent to ~(a & b).


Memory is affected according to the value of order.


_Bool __atomic_test_and_set_1 (int8_t *object, memory_order order);
_Bool __atomic_test_and_set_2 (int16_t *object, memory_order order);
_Bool __atomic_test_and_set_4 (int32_t *object, memory_order order)
_Bool __atomic_test_and_set_8 (int64_t *object, memory_order order)


Atomically, checks the value pointed to by object and if it is in the clear 
state, set the value pointed to by object to the set state and returns true, 
and if it is in the set state, returns false. The size of memory affected by 
the effects is always one byte.


Memory is affected according to the value of order.


The set and clear state are the same as specified for atomic_flag_test_and_set.


_Bool __atomic_is_lock_free (size_t size, void *object);


Returns whether the object pointed to by object is lock-free. The function 
assumes that the size of the object is size. If object is NULL then the 
function assumes that object is aligned on an size-byte address.


The function takes the size of an object and an address which is one of the 
following three cases 
- the address of the object 
- a faked address that solely indicates the alignment of the object's address 
- NULL, which means that the alignment of the object matches size and returns 
whether the object is lock-free.


void __atomic_feraiseexcept (int exception);


Raise floating point exception(s) that specified by exception. The int input 
argument exception represents a subset of floating-point exceptions, and can be 
zero or the bitwise OR of one or more floating-point exception macros. The 
macros are defined in fenv.h in section 4.1.


4.3. 64-bit Specific Interfaces


4.3.1. Data Representation of __int128 type


On x86 platforms, __int128 type is defined in the 64-bit ABI.


On SPARC platforms, the size and alignment of __int128 type is specified as the 
following:


             sizeof   Alignment
__int128       16        16


4.3.2. Support Functions


The following functions are available only on 64-bit platforms.


__int128 __atomic_load_16 (__int128 *object, memory_order order);
void __atomic_store_16 (__int128 *object, __int128 desired, memory_order order);
__int128 __atomic_exchange_16 (__int128 * object,  __int128 desired, 
memory_order order);
_Bool __atomic_compare_exchange_16 (__int128 *object, __int128 *expected, 
__int128 desired, memory_order success_order, memory_order failure_order);
__int128 __atomic_add_fetch_16 (__int128 *object, __int128 operand, 
memory_order order);
__int128 __atomic_fetch_add_16 (__int128 *object, __int128 operand, 
memory_order order);
__int128 __atomic_sub_fetch_16 (__int128 *object, __int128 operand, 
memory_order order);
__int128 __atomic_fetch_sub_16 (__int128 *object, __int128 operand, 
memory_order order);
__int128 __atomic_and_fetch_16 (__int128 *object, __int128 operand, 
memory_order order);
__int128 __atomic_fetch_and_16 (__int128 *object, __int128 operand, 
memory_order order);
__int128 __atomic_or_fetch_16 (__int128 *object, __int128 operand, memory_order 
order);
__int128 __atomic_fetch_or_16 (__int128 *object, __int128 operand, memory_order 
order);
__int128 __atomic_xor_fetch_16 (__int128 *object, __int128 operand, 
memory_order order);
__int128 __atomic_fetch_xor_16 (__int128 *object, __int128 operand, 
memory_order order);
__int128 __atomic_nand_fetch_16 (__int128 *object, __int128 operand, 
memory_order order);
__int128 __atomic_fetch_nand_16 (__int128 *object, __int128 operand, 
memory_order order);
_Bool __atomic_test_and_set_16 (__int128 *object, memory_order order);


The description of each function is the same with the corresponding set of 
functions specified in section 4.2.


5. Libatomic symbol versioning


Here is the mapfile for symbol versioning of the libatomic library specified by 
this ABI specification


LIBATOMIC_1.0 {
  global:
    __atomic_load;
    __atomic_store;
    __atomic_exchange;
    __atomic_compare_exchange;
    __atomic_is_lock_free;


    __atomic_add_fetch_1;
    __atomic_add_fetch_2;
    __atomic_add_fetch_4;
    __atomic_add_fetch_8;
    __atomic_add_fetch_16;
    __atomic_and_fetch_1;
    __atomic_and_fetch_2;
    __atomic_and_fetch_4;
    __atomic_and_fetch_8;
    __atomic_and_fetch_16;
    __atomic_compare_exchange_1;
    __atomic_compare_exchange_2;
    __atomic_compare_exchange_4;
    __atomic_compare_exchange_8;
    __atomic_compare_exchange_16;
    __atomic_exchange_1;
    __atomic_exchange_2;
    __atomic_exchange_4;
    __atomic_exchange_8;
    __atomic_exchange_16;
    __atomic_fetch_add_1;
    __atomic_fetch_add_2;
    __atomic_fetch_add_4;
    __atomic_fetch_add_8;
    __atomic_fetch_add_16;
    __atomic_fetch_and_1;
    __atomic_fetch_and_2;
    __atomic_fetch_and_4;
    __atomic_fetch_and_8;
    __atomic_fetch_and_16;
    __atomic_fetch_nand_1;
    __atomic_fetch_nand_2;
    __atomic_fetch_nand_4;
    __atomic_fetch_nand_8;
    __atomic_fetch_nand_16;
    __atomic_fetch_or_1;
    __atomic_fetch_or_2;
    __atomic_fetch_or_4;
    __atomic_fetch_or_8;
    __atomic_fetch_or_16;
    __atomic_fetch_sub_1;
    __atomic_fetch_sub_2;
    __atomic_fetch_sub_4;
    __atomic_fetch_sub_8;
    __atomic_fetch_sub_16;
    __atomic_fetch_xor_1;
    __atomic_fetch_xor_2;
    __atomic_fetch_xor_4;
    __atomic_fetch_xor_8;
    __atomic_fetch_xor_16;
    __atomic_load_1;
    __atomic_load_2;
    __atomic_load_4;
    __atomic_load_8;
    __atomic_load_16;
    __atomic_nand_fetch_1;
    __atomic_nand_fetch_2;
    __atomic_nand_fetch_4;
    __atomic_nand_fetch_8;
    __atomic_nand_fetch_16;
    __atomic_or_fetch_1;
    __atomic_or_fetch_2;
    __atomic_or_fetch_4;
    __atomic_or_fetch_8;
    __atomic_or_fetch_16;
    __atomic_store_1;
    __atomic_store_2;
    __atomic_store_4;
    __atomic_store_8;
    __atomic_store_16;
    __atomic_sub_fetch_1;
    __atomic_sub_fetch_2;
    __atomic_sub_fetch_4;
    __atomic_sub_fetch_8;
    __atomic_sub_fetch_16;
    __atomic_test_and_set_1;
    __atomic_test_and_set_2;
    __atomic_test_and_set_4;
    __atomic_test_and_set_8;
    __atomic_test_and_set_16;
    __atomic_xor_fetch_1;
    __atomic_xor_fetch_2;
    __atomic_xor_fetch_4;
    __atomic_xor_fetch_8;
    __atomic_xor_fetch_16;


  local:
    *;
};
LIBATOMIC_1.1 {
  global:
    __atomic_feraiseexcept;
} LIBATOMIC_1.0;
LIBATOMIC_1.2 {
  global:
    atomic_thread_fence;
    atomic_signal_fence;
    atomic_flag_test_and_set;
    atomic_flag_test_and_set_explicit;
    atomic_flag_clear;
    atomic_flag_clear_explicit;
} LIBATOMIC_1.1;


6. Libatomic Assumption on Non-blocking Memory Instructions


libatomic assumes that programmers or compilers properly insert
SFENCE/MFENCE barriers for the following cases


1) writes executed with CLFLUSH instruction
2) streaming loads/stores (V)MOVNTx, MASKMOVDQU, MASKMOVQ.
3) any other operations which reference Write Combining memory type.


Rationale


x86 has a strong memory model. Memory reads are not reordered with other reads, 
writes are not reordered with reads and other writes. The three cases mentioned 
are exceptions, i.e. those writes will not block other writes. The ABI 
specifies that code uses those non-blocking writes should contain proper 
fences, so that libatomic support functions do not need fences to synchronize 
with those instructions.


Appendix


A.1. Compatibility Notes


On 64-bit SPARC platforms, _Atomic long double is a 16-byte naturally aligned 
atomic type. There is no hardware atomic instruction for such type in 64-bit 
SPARC ISA, and it is not inlineable in this ABI specification.


If in the future, hardware atomic instructions for 16-byte naturally aligned 
objects are available in a new SPARC ISA, then libatomic could leverage such 
instructions to implement atomic operations for _Atomic long double.


This would be a backward compatible libatomic change. The type is not 
inlineable, all atomic operations on objects of the type must be via libatomic 
function calls, so all such atomic operations will be changed to use hardware 
atomic instructions in those libatomic functions without breaking the 
compiler-library interface.


However, if a compiler inlines an atomic operation on an _Atomic long double 
object using the new hardware atomic instructions, it breaks the compatibility 
if the library implementation still does not use such instructions. In such 
case, the libatomic library and the compiler should be upgraded in lock-step, 
and the inlineable property for certain atomic types must be updated.


If the compiler change the data representation of atomic types, such change 
will cause incompatible binary and it would be hard to detect if the 
incompatible binaries are linked together.


A.2. References


[1] INCITS/ISO/IEC 9899-2011[2012], 6.2.5p27
The size, representation, and alignment of an atomic type need not be the same 
as those of the corresponding unqualified type.


[2] INCITS/ISO/IEC 9899-2011[2012], 7.17.6p1
For each line in the following table,257) the atomic type name is declared as a 
type that has the same representation and alignment requirements as the 
corresponding direct type.258)


Footnote 258
258) The same representation and alignment requirements are meant to imply 
interchangeability as arguments to functions, return values from functions, and 
members of unions.


[3] INCITS/ISO/IEC 9899-2011[2012], 6.7.2.1p5
A bit-field shall have a type that is a qualified or unqualified version of 
_Bool, signed int, unsigned int, or some other implementation-defined type. It 
is implementation-defined whether atomic types are permitted.


[4] INCITS/ISO/IEC 14882-2011[2012], 29.4p2
The function atomic_is_lock_free (29.6) indicates whether the object is 
lock-free. In any given program execution, the result of the lock-free query 
shall be consistent for all pointers of the same type.


[5] INCITS/ISO/IEC 9899-2011[2012], 7.17.5.1p3
The atomic_is_lock_free generic function returns nonzero (true) if and only if 
the object's operations are lock-free. The result of a lock-free query on one 
object cannot be inferred from the result of a lock-free query on another 
object.


[6] DR 465: http://www.open-std.org/jtc1/sc22/wg14/www/docs/summary.htm#dr_465


[7] INCITS/ISO/IEC 9899-2011[2012], 6.7.2.4p3
The type name in an atomic type specifier shall not refer to an array type, a 
function type, an atomic type, or a qualified type.


[8] INCITS/ISO/IEC 9899-2011[2012], 6.7.3p3
The type modified by the _Atomic qualifier shall not be an array type or a 
function type.


[9] DR 431: http://www.open-std.org/jtc1/sc22/wg14/www/docs/summary.htm#dr_431


[10] INCITS/ISO/IEC 9899-2011[2012], 7.17.5p2


[11] INCITS/ISO/IEC 14882-2011[2012], 29.4p3


[12] Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3A: 
System Programming Guide, Part 1, 8.1.1 Guaranteed Atomic Operations


[13] Can Seqlocks Get Along with Programming Language Memory Models? Hans-J. 
Boehm, http://www.hpl.hp.com/techreports/2012/HPL-2012-68.pdf

Reply via email to