Re: Fwd: Re: GCC libatomic questions

Bin Fan Thu, 07 Jul 2016 16:58:11 -0700

Hi,

I have a revised version of the libatomic ABI draft which tries toaccommodate Richard's comments. The new version is attached. The diff isalso appended.


Thanks,
- Bin

diff ABI.txt ABI-1.1.txt
28a29,30
> - The versioning of the library external symbols
>
47a50,57
> Note
>
> Some 64-bit x86 ISA does not support the cmpxchg16b instruction, for
> example, some early AMD64 processors and later Intel Xeon Phi co-
> processor. Whether cmpxchg16b is supported may affect the ABI
> specification for certain atomic types. We will discuss the detail
> where it has an impact.
>
101c111,112

< _Atomic __int128 16 16 N notapplicable

---

> _Atomic __int128 (with at16) 16 16 Y notapplicable> _Atomic __int128 (w/o at16) 16 16 N notapplicable

105c116,117

< _Atomic long double 16 16 N 124 N

---

> _Atomic long double (with at16) 16 16 Y 124 N> _Atomic long double (w/o at16) 16 16 N 124 N

106a119,120

> _Atomic double _Complex 16 16(8) Y 1616(8) N

>                     (with at16)
107a122
>                     (w/o at16)
110a126,127

> _Atomic long double _Imaginary 16 16 Y 124 N

>                     (with at16)
111a129
>                     (w/o at16)
146a165,167
> with at16 means the ISA supports cmpxchg16b, w/o at16 means the ISA
> does not support cmpxchg16b.
>
191a213,214

> _Atomic struct {char a[16];} 16 16(1) Y16 16(1) N

>                     (with at16)
192a216
>                     (w/o at16)
208a233,235
> with at16 means the ISA supports cmpxchg16b, w/o at16 means the ISA
> does not support cmpxchg16b.
>
246a274,276
> On the 64-bit x86 platform which supports the cmpxchg16b instruction,
> 16-byte atomic types whose alignment matches the size is inlineable.
>
303,306c333,338
< CMPXCHG16B is not always available on 64-bit x86 platforms, so 16-byte
< naturally aligned atomics are not inlineable. The support functions for
< such atomics are free to use lock-free implementation if the instruction
< is available on specific platforms.
---
> "Inlineability" is a compile time property, which in most cases depends
> only on the type. In a few cases it also depends on whether the target
> ISA supports the cmpxchg16b instruction. A compiler may get the ISA
> information by either compilation flags or inquiring the hardware

> capabilities. When the hardware capabilities information is notavailable,

> the compiler should assume the cmpxchg16b instruction is not supported.
665a698,705
>     The function takes the size of an object and an address which
>     is one of the following three cases
>     - the address of the object
>     - a faked address that solely indicates the alignment of the
>       object's address
>     - NULL, which means that the alignment of the object matches size
>     and returns whether the object is lock-free.
>
711c751
< 5. Libatomic Assumption on Non-blocking Memory Instructions
---
> 5. Libatomic symbol versioning
712a753,868
> Here is the mapfile for symbol versioning of the libatomic library
> specified by this ABI specification
>
> LIBATOMIC_1.0 {
>   global:
>     __atomic_load;
>     __atomic_store;
>     __atomic_exchange;
>     __atomic_compare_exchange;
>     __atomic_is_lock_free;
>
>     __atomic_add_fetch_1;
>     __atomic_add_fetch_2;
>     __atomic_add_fetch_4;
>     __atomic_add_fetch_8;
>     __atomic_add_fetch_16;
>     __atomic_and_fetch_1;
>     __atomic_and_fetch_2;
>     __atomic_and_fetch_4;
>     __atomic_and_fetch_8;
>     __atomic_and_fetch_16;
>     __atomic_compare_exchange_1;
>     __atomic_compare_exchange_2;
>     __atomic_compare_exchange_4;
>     __atomic_compare_exchange_8;
>     __atomic_compare_exchange_16;
>     __atomic_exchange_1;
>     __atomic_exchange_2;
>     __atomic_exchange_4;
>     __atomic_exchange_8;
>     __atomic_exchange_16;
>     __atomic_fetch_add_1;
>     __atomic_fetch_add_2;
>     __atomic_fetch_add_4;
>     __atomic_fetch_add_8;
>     __atomic_fetch_add_16;
>     __atomic_fetch_and_1;
>     __atomic_fetch_and_2;
>     __atomic_fetch_and_4;
>     __atomic_fetch_and_8;
>     __atomic_fetch_and_16;
>     __atomic_fetch_nand_1;
>     __atomic_fetch_nand_2;
>     __atomic_fetch_nand_4;
>     __atomic_fetch_nand_8;
>     __atomic_fetch_nand_16;
>     __atomic_fetch_or_1;
>     __atomic_fetch_or_2;
>     __atomic_fetch_or_4;
>     __atomic_fetch_or_8;
>     __atomic_fetch_or_16;
>     __atomic_fetch_sub_1;
>     __atomic_fetch_sub_2;
>     __atomic_fetch_sub_4;
>     __atomic_fetch_sub_8;
>     __atomic_fetch_sub_16;
>     __atomic_fetch_xor_1;
>     __atomic_fetch_xor_2;
>     __atomic_fetch_xor_4;
>     __atomic_fetch_xor_8;
>     __atomic_fetch_xor_16;
>     __atomic_load_1;
>     __atomic_load_2;
>     __atomic_load_4;
>     __atomic_load_8;
>     __atomic_load_16;
>     __atomic_nand_fetch_1;
>     __atomic_nand_fetch_2;
>     __atomic_nand_fetch_4;
>     __atomic_nand_fetch_8;
>     __atomic_nand_fetch_16;
>     __atomic_or_fetch_1;
>     __atomic_or_fetch_2;
>     __atomic_or_fetch_4;
>     __atomic_or_fetch_8;
>     __atomic_or_fetch_16;
>     __atomic_store_1;
>     __atomic_store_2;
>     __atomic_store_4;
>     __atomic_store_8;
>     __atomic_store_16;
>     __atomic_sub_fetch_1;
>     __atomic_sub_fetch_2;
>     __atomic_sub_fetch_4;
>     __atomic_sub_fetch_8;
>     __atomic_sub_fetch_16;
>     __atomic_test_and_set_1;
>     __atomic_test_and_set_2;
>     __atomic_test_and_set_4;
>     __atomic_test_and_set_8;
>     __atomic_test_and_set_16;
>     __atomic_xor_fetch_1;
>     __atomic_xor_fetch_2;
>     __atomic_xor_fetch_4;
>     __atomic_xor_fetch_8;
>     __atomic_xor_fetch_16;
>
>   local:
>     *;
> };
> LIBATOMIC_1.1 {
>   global:
>     __atomic_feraiseexcept;
> } LIBATOMIC_1.0;
> LIBATOMIC_1.2 {
>   global:
>     atomic_thread_fence;
>     atomic_signal_fence;
>     atomic_flag_test_and_set;
>     atomic_flag_test_and_set_explicit;
>     atomic_flag_clear;
>     atomic_flag_clear_explicit;
> } LIBATOMIC_1.1;
>
> 6. Libatomic Assumption on Non-blocking Memory Instructions
>
752,753c908,910
< So such compiler change must be accompanied by a library change, and
< the ABI must be updated as well.
---
> In such case, the libatomic library and the compiler should be upgraded
> in lock-step, and the inlineable property for certain atomic types
> will be changed from false to true.


On 7/6/2016 12:41 PM, Richard Henderson wrote:

CMPXCHG16B is not always available on 64-bit x86 platforms, so 16-byte
naturally aligned atomics are not inlineable. The support functions for
such atomics are free to use lock-free implementation if the instruction
is available on specific platforms.
Except that it is available on almost all 64-bit x86 platforms. As faras I know, only 2004 era AMD processors didn't have it; all Intel64-bit cpus have supported it.
Further, gcc will most certainly make use of it when one specifies anycommand-line option that enables it, such as -march=native.
Therefore we must specify that for x86_64, 16-byte objects arenon-locking on cpus that support cmpxchg16b.
However, if a compiler inlines an atomic operation on an _Atomic long
double object and uses the new lock-free instructions, it could break
the compatibility if the library implementation is still non-lock-free.
So such compiler change must be accompanied by a library change, and
the ABI must be updated as well.
The tie between gcc version and libgcc.so version is tight; I see noreason that the libatomic.so version should not also be tight with thecompiler version.
It is sufficient that libatomic use atomic instructions when they areavailable. If a new processor comes out with new capabilities, thecompiler and runtime are upgraded in lock-step.
How that is selected is beyond the ABI but possible solutions are

(1) ld.so search path, based on processor capabilities,
(2) ifunc (or workalike) where the function is selected at startup,
(3) explicit runtime test within the relevant functions.
All solutions expose the same function interface so the function callABI is not affected.
_Bool __atomic_is_lock_free (size_t size, void *object);

    Returns whether the object pointed to by object is lock-free.
    The function assumes that the size of the object is size. If object
    is NULL then the function assumes that object is aligned on an
    size-byte address.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65033
The actual code change is completely within libstdc++, but it affectsthe description of the libatomic function.
C++ requires that is_lock_free return the same result for all objectsof a given type. Whereas __atomic_is_lock_free, with a non-nullobject, determinesif we will implement lock free for a *specific* object, using thespecific object's alignment.
Rather than break the ABI and add a different function that passes thetype alignment, the solution we hit upon was to pass a "fake",minimally aligned pointer as the object parameter: (void*)(uintptr_t)-__alignof(type).
The final component of the ABI that you've forgotten to specify, ifyou want full compatibility of linked binaries, is symbol versioning.
We have had two ABI additions since the original release.  See
https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libatomic/libatomic.map;h=39e7c2c6b9a70121b5f4031da346a27ae6c1be98;hb=HEAD
r~

1. Overview

1.1. Why we need an ABI for atomics

C11 standard allows different size, representation and alignment
between atomic types and the corresponding non-atomic types [1].
The size, representation and alignment of atomic types need to be 
specified in the ABI specification.

A runtime support library, libatomic, already exists on Solaris 
and Linux. The interface of this library needs to be standardized 
as part of the ABI specification, so that

- On a system that supply libatomic, all compilers in compliance 
  with the ABI can generate compatible binaries linking this library.

- The binary can be backward compatible on different versions of 
  the system as long as they support the same ABI.

1.2. What does the atomics ABI specify

The ABI specifies the following

- Data representation of the atomic types.

- The names and behaviors of the implementation-specific support
  functions.

- The versioning of the library external symbols

- The atomic types for which the compiler may generate inlined code. 

- Lock-free property of the inlined atomic operations.

Note that the name and behavior of the libatomic functions specified 
in the C standard do not need to be part of this ABI, because they 
are already required to meet the specification in the standard.

1.3. Affected platforms

The following platforms are affected by this ABI specification.

SPARC (32-bit and 64-bit)
x86 (32-bit and 64-bit)

Section 1.1 and 1.2, and the Rationale, Notes and Appendix sections 
in the rest of the document are for explanation purpose only, it 
is not considered as part of the formal ABI specification.

Note

Some 64-bit x86 ISA does not support the cmpxchg16b instruction, for
example, some early AMD64 processors and later Intel Xeon Phi co-
processor. Whether cmpxchg16b is supported may affect the ABI 
specification for certain atomic types. We will discuss the detail
where it has an impact.

2. Data Representation

2.1. General Rules

The general rules for size, representation and alignment of the data
representation of atomic types are the following

1) Atomic types assume the same size with the corresponding non-atomic 
   types.

2) Atomic types assume the same representation with the corresponding 
   non-atomic types.

3) Atomic types assume the same alignment with the corresponding 
   non-atomic types, with the following exceptions:

   On 32- and 64-bit x86 platforms and on 64-bit SPARC platforms, 
   atomic types of size 1, 2, 4, 8 or 16-byte have the alignment 
   that matches the size.

   On 32-bit SPARC platforms, atomic types of size 1, 2, 4 or 8-byte
   have the alignment that matches the size. If the alignment of a 
   16-byte non-atomic type is less than 8-byte, the alignment of the 
   corresponding atomic type is increased to 8-byte.

Note 

The above rules apply to both scalar types and aggregate types.

2.2. Atomic scalar types

x86

                                          LP64 (AMD64)                     
ILP32 (i386)
C Type                          sizeof    Alignment  Inlineable  sizeof    
Alignment  Inlineable
atomic_flag                     1         1          Y           1         1    
      Y
_Atomic _Bool                   1         1          Y           1         1    
      Y
_Atomic char                    1         1          Y           1         1    
      Y
_Atomic signed char             1         1          Y           1         1    
      Y
_Atomic unsigned char           1         1          Y           1         1    
      Y
_Atomic short                   2         2          Y           2         2    
      Y
_Atomic signed short            2         2          Y           2         2    
      Y
_Atomic unsigned short          2         2          Y           2         2    
      Y
_Atomic int                     4         4          Y           4         4    
      Y
_Atomic signed int              4         4          Y           4         4    
      Y
_Atomic enum                    4         4          Y           4         4    
      Y
_Atomic unsigned int            4         4          Y           4         4    
      Y
_Atomic long                    8         8          Y           4         4    
      Y
_Atomic signed long             8         8          Y           4         4    
      Y
_Atomic unsigned long           8         8          Y           4         4    
      Y
_Atomic long long               8         8          Y           8         8    
      Y
_Atomic signed long long        8         8          Y           8         8    
      Y
_Atomic unsigned long long      8         8          Y           8         8    
      Y
_Atomic __int128 (with at16)    16        16         Y               not 
applicable
_Atomic __int128 (w/o at16)     16        16         N               not 
applicable
any-type _Atomic *              8         8          Y           4         4    
      Y
_Atomic float                   4         4          Y           4         4    
      Y
_Atomic double                  8         8          Y           8         8    
      Y
_Atomic long double (with at16) 16        16         Y           12        4    
      N
_Atomic long double (w/o at16)  16        16         N           12        4    
      N
_Atomic float _Complex          8         8(4)       Y           8         8(4) 
      Y
_Atomic double _Complex         16        16(8)      Y           16        
16(8)      N
                    (with at16)
_Atomic double _Complex         16        16(8)      N           16        
16(8)      N
                    (w/o at16)
_Atomic long double _Complex    32        16         N           24        4    
      N
_Atomic float _Imaginary        4         4          Y           4         4    
      Y
_Atomic double _Imaginary       8         8          Y           8         8    
      Y
_Atomic long double _Imaginary  16        16         Y           12        4    
      N
                    (with at16)
_Atomic long double _Imaginary  16        16         N           12        4    
      N
                    (w/o at16)

SPARC

                                          LP64 (v9)                        
ILP32 (sparc)
C Type                          sizeof    Alignment  Inlineable  sizeof    
Alignment  Inlineable
atomic_flag                     1         1          Y           1         1    
      Y
_Atomic _Bool                   1         1          Y           1         1    
      Y
_Atomic char                    1         1          Y           1         1    
      Y
_Atomic signed char             1         1          Y           1         1    
      Y
_Atomic unsigned char           1         1          Y           1         1    
      Y
_Atomic short                   2         2          Y           2         2    
      Y
_Atomic signed short            2         2          Y           2         2    
      Y
_Atomic unsigned short          2         2          Y           2         2    
      Y
_Atomic int                     4         4          Y           4         4    
      Y
_Atomic signed int              4         4          Y           4         4    
      Y
_Atomic enum                    4         4          Y           4         4    
      Y
_Atomic unsigned int            4         4          Y           4         4    
      Y
_Atomic long                    8         8          Y           4         4    
      Y
_Atomic signed long             8         8          Y           4         4    
      Y
_Atomic unsigned long           8         8          Y           4         4    
      Y
_Atomic long long               8         8          Y           8         8    
      Y
_Atomic signed long long        8         8          Y           8         8    
      Y
_Atomic unsigned long long      8         8          Y           8         8    
      Y
_Atomic __int128                16            16         N               not 
applicable
any-type _Atomic *              8         8          Y           4         4    
      Y
_Atomic float                   4         4          Y           4         4    
      Y
_Atomic double                  8         8          Y           8         8    
      Y
_Atomic long double             16        16         N           16        8    
      N
_Atomic float _Complex          8         8(4)       Y           8         8(4) 
      Y
_Atomic double _Complex         16        16(8)      N           16        8    
      N
_Atomic long double _Complex    32        16         N           32        8    
      N
_Atomic float _Imaginary        4         4          Y           4         4    
      Y
_Atomic double _Imaginary       8         8          Y           8         8    
      Y
_Atomic long double _Imaginary  16        16         N           16        8    
      N

with at16 means the ISA supports cmpxchg16b, w/o at16 means the ISA
does not support cmpxchg16b.

Notes: 

C standard also specifies some atomic integer types. They are not
listed in the above table because they have the same representation 
and alignment requirements as the corresponding direct types [2].

We will discuss the inlineable column and __int128 type in section 3.

The value in () shows the alignment of the corresponding non-atomic 
type, if it is different from the alignment of the atomic type.

Because _Atomic specifier can not be used on a function type [7] and 
_Atomic qualifier can not modify a function type [8], there is no 
atomic function type listed in the above table.

On 32-bit x86 platforms, long double is of size 12-byte and is of 
alignment 4-byte. This ABI specification does not increase the 
alignment of _Atomic long double type because it would not be 
lock-free even if it is 16-byte aligned, since there is no 12-byte 
or 16-byte lock-free instruction on 32-bit x86 platforms.

2.3 Atomic Aggregates and Unions

Atomic structures or unions may have different alignment compared to
the corresponding non-atomic types, subject to rule 3) in section 2.1. 
The alignment change only affects the boundary where an entire 
structure or union is aligned. The offset of each member, the internal 
padding and the size of the structure or union are not affected.

The following table shows selective examples of the size and alignment
of atomic structure types.

x86

                                          LP64 (AMD64)                      
ILP32 (i386)
C Type                          sizeof    Alignment  Inlineable   sizeof    
Alignment  Inlineable
_Atomic struct {char a[2];}     2         2(1)       Y            2         
2(1)       Y
_Atomic struct {char a[3];}     3         1          N            3         1   
       N
_Atomic struct {short a[2];}    4         4(2)       Y            4         
4(2)       Y
_Atomic struct {int a[2];}      8         8(4)       Y            8         
8(4)       Y
_Atomic struct {char c;
                int i;}         8         8(4)       Y            8         
8(4)       Y
_Atomic struct {char c[2];
                short s;
                int i;}         8         8(4)       Y            8         
8(4)       Y
_Atomic struct {char a[16];}    16        16(1)      Y            16        
16(1)      N
                    (with at16)
_Atomic struct {char a[16];}    16        16(1)      N            16        
16(1)      N
                    (w/o at16)

SPARC

                                          LP64 (v9)                       ILP32 
(sparc)
C Type                          sizeof    Alignment  Inlineable   sizeof    
Alignment  Inlineable
_Atomic struct {char a[2];}     2         2(1)       Y            2         
2(1)       Y 
_Atomic struct {char a[3];}     3         1          N            3         1   
       N
_Atomic struct {short a[2];}    4         4(2)       Y            4         
4(2)       Y
_Atomic struct {int a[2];}      8         8(4)       Y            8         
8(4)       Y
_Atomic struct {char c;
                int i;}         8         8(4)       Y            8         
8(4)       Y
_Atomic struct {char c[2];
                short s;
                int i;}         8         8(4)       Y            8         
8(4)       Y
_Atomic struct {char a[16];}    16        16(1)      N            16        
8(1)       N

with at16 means the ISA supports cmpxchg16b, w/o at16 means the ISA
does not support cmpxchg16b.

Notes

The value in () shows the alignment of the corresponding non-atomic 
type, if it is different from the alignment of the atomic type.

Because the padding of structure types is not affected by _Atomic 
modifier, the contents of any padding in the atomic structure object 
is still undefined, therefore the atomic compare and exchange operation 
on such objects may fail due to the difference of the padding.

The increased alignment of 16-byte atomic struct types might be 
useful to 
- Reduce sharing locks with other atomics. 
- Allow more efficient implementation of runtime support functions
  for atomic operations on such types.

2.4. Bit-fields

It is implementation defined in the C standard that whether atomic 
bit-field types are permitted [3]. In this ABI specification, The 
representation of atomic bit-field is unspecified.

3. Lock-free and Inlineable Property

The implementation of atomic operations may map directly to hardware 
atomic instructions. This kind of implementation is lock-free.

Lock-free atomic operations does not require runtime support functions.
The compiler may generate inlined code for efficiency. This ABI 
specification defines a few inlineable atomic types. An atomic type
is inlineable means the compiler may generate inlined instruction 
sequence for atomic operations on such types. The implementation of
the support functions for the inlineable atomic types must also be 
lock free.

On all affected platforms, atomic types whose size equal to 1, 2, 4 
or 8 and alignment matches the size are inlineable

On the 64-bit x86 platform which supports the cmpxchg16b instruction,
16-byte atomic types whose alignment matches the size is inlineable.

If an atomic type is not inlineable, the compiler shall always generate
support function call for the atomic operation on the object of such type. 
The implementation of the support functions for non-inlineable atomic
types may be lock-free.

Rationale

It is assumed that there is no way for an atomic object to be accessed 
from both lock-free operation and non-lock-free operation and the 
atomic semantic can be satisfied.

If the compiler always generates runtime support function calls for 
all atomics, the lock-free property would be hidden inside the library 
implementation. However, the compiler may inline the atomic operations, 
and we want to allow such inlining optimizations.

The compiler inlining raises an issue of mix-and-matched accesses to
the same atomic object from the compiler generated code and the runtime 
library function. They have to be consistent on the lock-free property.

One possible solution to achieve the lock-free consistency is to specify 
the lock-free property on a per-type basis. The C and C++ standard seem 
to back this approach: C++ standard provides a query that returns a 
per-type result about whether the type is lock-free [4]. C standard 
does not guarantee that the query result is per-type [5], but it's the 
direction it is going towards [6]. However, the query result does not 
necessarily reflect the implementation of the atomic operation on the 
queried type. The implementation may use lock-free instructions for 
a specific object that meets certain criteria. So specifying the 
lock-free property on a per-type basis is unnecessarily conservative. 

It is possible to specify the lock-free property on a per-object basis.
But it is simpler to disallow the compiler to inline the atomic 
operations for "may be lock-free" types in order to hide the lock-free 
optimization inside the library implementation.

So the ABI achieve the lock-free consistency by specifying which types 
may be inlined and specifying that those types must be lock-free, so 
that for the inlineable atomic types, if there are mix-and-matched 
accesses, they must both be lock-free; and for the non-inlineable atomic
types, the compiler never inlines so the mix-and-match never happens.

Notes:

Here are a few examples of small types which don't qualify as 
inlineable type:

  _Atomic struct {char a[3];} /* size = 3, alignment = 1 */
  _Atomic long double /* (on 32-bit x86) size = 12, alignment = 4 */

A smart compiler may know such an object is located at an address that 
fits in an 8-byte aligned window, but the ABI compliance behavior is
to not generate lock-free inlined code sequence, since a lazy compiler 
may generate a runtime support function call which may not be 
implemented lock-free.

"Inlineability" is a compile time property, which in most cases depends
only on the type. In a few cases it also depends on whether the target 
ISA supports the cmpxchg16b instruction. A compiler may get the ISA 
information by either compilation flags or inquiring the hardware 
capabilities. When the hardware capabilities information is not available,
the compiler should assume the cmpxchg16b instruction is not supported.

4. libatomic library functions

4.1. Data Definitions

This section contains examples of system header files that provide 
data interface needed by the libatomic functions.

<stdatomic.h>

typedef enum
{
    memory_order_relaxed = 0,
    memory_order_consume = 1,
    memory_order_acquire = 2,
    memory_order_release = 3,
    memory_order_acq_rel = 4,
    memory_order_seq_cst = 5
} memory_order;

typedef _Atomic struct
{
  unsigned char __flag;
} atomic_flag;

Refer to C standard for the meaning of each enumeration constants of
memory_order type.

<fenv.h>

SPARC

#define FE_INEXACT    0x01
#define FE_DIVBYZERO  0x02
#define FE_UNDERFLOW  0x04
#define FE_OVERFLOW   0x08
#define FE_INVALID    0x10

x86

#define FE_INVALID    0x01
#define FE_DIVBYZERO  0x04
#define FE_OVERFLOW   0x08
#define FE_UNDERFLOW  0x10
#define FE_INEXACT    0x20

4.2. Support Functions

The following kinds of atomic operations are supported by the runtime
library: load, store, exchange, compare-and-exchange and arithmetic 
read-modify-write operations. For the arithmetic read-modify-write 
operations, the following kinds of modification operation are supported: 
addition, subtraction, bitwise inclusive or, bitwise exclusive or, 
bitwise and, bitwise nand. There is also classic test-and-set functions.

For each kind of atomic operations, libatomic provide a generic version 
which accepts a pointer of all atomic types and a set of functions that 
accept a pointer of some special atomic types which are of size 
1, 2, 4 and 8- byte on all platforms and 16-byte on 64-bit platforms.

Note: Section 2.1 mentions the alignment adjustment for atomic types of 
sizes 1, 2, 4, 8 and 16-byte. For load, store, exchange and compre-and-
exchange operations, it is safe to convert a pointer of any atomic types 
of those sizes to the pointer of corresponding atomic integer types with 
the same size.

Note: The size specific versions accept and return data by value, the 
generic version use memory pointers to pass and return the data objects.

Most of the functions listed in this section can be mapped to the generic 
functions with the same semantics in the C standard. Refer to the C 
standard for the description of the generic functions and how each memory 
order works.

The following functions are available on all platforms.

void __atomic_load (size_t size, void *object, void *loaded, memory_order 
order);

    Atomically load the value pointed to by object. Assign the loaded
    value to the memory pointed to by loaded. The size of memory
    affected by the load is designated by size. 

int8_t __atomic_load_1 (int8_t *object, memory_order order);
int16_t __atomic_load_2 (int16_t *object, memory_order order);
int32_t __atomic_load_4 (int32_t *object, memory_order order);
int64_t __atomic_load_8 (int64_t *object, memory_order order);

    Atomically load the value pointed to by object. The loaded value is
    returned. The size of memory affected by the load is designated by
    the type of the object. If object is not aligned properly according 
    to the type of object, the behavior is undefined.

    Memory is affected according to the value of order. If order is either
    memory_order_release or memory_order_acq_rel, the behavior of the 
    function is undefined.

void __atomic_store (size_t size, void *object, void *desired, memory_order 
order)

    Atomically replace the value pointed to by object with the value
    pointed to by desired. The size of memory affected by the store
    is designated by size.

void __atomic_store_1 (int8_t *object, int8_t desired, memory_order order);
void __atomic_store_2 (int16_t *object, int16_t desired, memory_order order);
void __atomic_store_4 (int32_t *object, int32_t desired, memory_order order);
void __atomic_store_8 (int64_t *object, int64_t desired, memory_order order);

    Atomically replace the value pointed to by object with desired.
    The size of memory affected by the store is designated by the
    type of the object. If object is not aligned properly according 
    to the type of object, the behavior is undefined.

    Memory is affected according to the value of order. If order is one of
    memory_order_acquire, memory_order_consume or memory_order_acq_rel, the
    behavior of the function is undefined.

void __atomic_exchange (size_t size, void *object, void *desired, void *loaded, 
memory_model order);

    Atomically, replace the value pointed to by object with the value
    pointed to by desired and assign the value pointed to by loaded to
    the value pointed to by object immediately before the effect. The 
    size of memory affected by the exchange is designated by size.

int8_t __atomic_exchange_1 (int8_t * object, int8_t desired, memory_order)
int16_t __atomic_exchange_2 (int16_t * object, int16_t desired, memory_order)
int32_t __atomic_exchange_4 (int32_t * object, int32_t desired, memory_order)
int64_t __atomic_exchange_8 (int64_t * object, int64_t desired, memory_order)

    Atomically, replace the value pointed to by object with desired 
    and return the value pointed to by object immediately before the 
    effect. The size of memory affected by the exchange is designated 
    by the type of object. If object is not aligned properly according 
    to the type of object, the behavior is undefined.

    Memory is affected according to the value of order.

_Bool __atomic_compare_exchange (size_t size, void *object, void *expected, 
void *desired, memory_model success_order, memory_model failure_order);

    Atomically, compares the memory pointed to by object for equality with 
    the memory pointed to by expected, and if true, replaces the memory
    pointed to by object with the memory pointed to by desired, and if false,
    updates the memory pointed to by expected with the memory pointed to by 
    object. The result of the comparison is returned. The size of memory 
    affected by the compare and exchange is designated by size.

    The compare and exchange never fail spuriously, i.e. if the comparison 
    for equality returns false, the two values in the comparison were not 
    equal. [Note, this is to specify that on SPARC and x86, compare exchange 
    is always implemented with "strong" semantic. The weak flavors in the 
    C standard is translated to strong.]

_Bool __atomic_compare_exchange_1 (int8_t *object, int8_t *expected, int8_t 
desired, memory_order success_order, memory_order failure_order);
_Bool __atomic_compare_exchange_2 (int16_t *object, int16_t *expected, int16_t 
desired, memory_order success_order, memory_order failure_order);
_Bool __atomic_compare_exchange_4 (int32_t *object, int32_t *expected, int32_t 
desired, memory_order success_order, memory_order failure_order);
_Bool __atomic_compare_exchange_8 (int64_t *object, int64_t *expected, int64_t 
desired, memory_order success_order, memory_order failure_order);

    Atomically, compares the memory pointed to by object for equality with 
    the memory pointed to by expected, and if true, replaces the memory
    pointed to by object with desired, and if false, updates the memory
    pointed to by expected with the memory pointed to by object. The 
    result of the comparison is returned.

    The size of memory affected by the compare and exchange is designated 
    by the type of object. If object is not aligned properly according 
    to the type of object, the behavior is undefined.

    The compare and exchange never fail spuriously, i.e. if the comparison 
    for equality returns false, the two values in the comparison were not 
    equal.

    If the comparison is true, memory is affected according to the 
    value of success_order, and if the comparison is false, memory is 
    affected according to the value of failure_order. The result of the
    comparison is returned. 

int8_t __atomic_add_fetch_1 (int8_t *object, int8_t operand, memory_order 
order);
int16_t __atomic_add_fetch_2 (int16_t *object, int16_t operand, memory_order 
order);
int32_t __atomic_add_fetch_4 (int32_t *object, int32_t operand, memory_order 
order);
int64_t __atomic_add_fetch_8 (int64_t *object, int64_t operand, memory_order 
order);

    Atomically replaces the value pointed to by object with the result of
    the value pointed to by object plus operand and returns the value
    pointed to by object immediately after the effects. If object is 
    not aligned properly according to the type of object, the behavior 
    is undefined. The size of memory affected by the effects is designated 
    by the type of object.

int8_t __atomic_fetch_add_1 (int8_t *object, int8_t operand, memory_order 
order);
int16_t __atomic_fetch_add_2 (int16_t *object, int16_t operand, memory_order 
order);
int32_t __atomic_fetch_add_4 (int32_t *object, int32_t operand, memory_order 
order);
int64_t __atomic_fetch_add_8 (int64_t *object, int64_t operand, memory_order 
order);

    Atomically replaces the value pointed to by object with the result of
    the value pointed to by object plus operand and returns the value
    pointed to by object immediately before the effects. If object is 
    not aligned properly according to the type of object, the behavior 
    is undefined. The size of memory affected by the effects is designated 
    by the type of object.

    Memory is affected according to the value of order.

int8_t __atomic_sub_fetch_1 (int8_t *object, int8_t operand, memory_order 
order);
int16_t __atomic_sub_fetch_2 (int16_t *object, int16_t operand, memory_order 
order);
int32_t __atomic_sub_fetch_4 (int32_t *object, int32_t operand, memory_order 
order);
int64_t __atomic_sub_fetch_8 (int64_t *object, int64_t operand, memory_order 
order);

    Atomically replaces the value pointed to by object with the result of
    the value pointed to by object minus operand and returns the value
    pointed to by object immediately after the effects. If object is not 
    aligned properly according to the type of object, the behavior is 
    undefined. The size of memory affected by the effects is designated 
    by the type of object.

int8_t __atomic_fetch_sub_1 (int8_t *object, int8_t operand, memory_order 
order);
int16_t __atomic_fetch_sub_2 (int16_t *object, int16_t operand, memory_order 
order);
int32_t __atomic_fetch_sub_4 (int32_t *object, int32_t operand, memory_order 
order);
int64_t __atomic_fetch_sub_8 (int64_t *object, int64_t operand, memory_order 
order);

    Atomically replaces the value pointed to by object with the result of
    the value pointed to by object minus operand and returns the value
    pointed to by object immediately before the effects. If object is 
    not aligned properly according to the type of object, the behavior 
    is undefined.  The size of memory affected by the effects is 
    designated by the type of object.

    Memory is affected according to the value of order.

int8_t __atomic_and_fetch_1 (int8_t *object, int8_t operand, memory_order 
order);
int16_t __atomic_and_fetch_2 (int16_t *object, int16_t operand, memory_order 
order);
int32_t __atomic_and_fetch_4 (int32_t *object, int32_t operand, memory_order 
order);
int64_t __atomic_and_fetch_8 (int64_t *object, int64_t operand, memory_order 
order);

    Atomically, replaces the value pointed to by object with the result of 
    bitwise and of the value pointed to by object and operand and returns 
    the value pointed to by object immediately after the effects. If object 
    is not aligned properly according to the type of object, the behavior 
    is undefined.  The size of memory affected by the effects is designated 
    by the type of object.  

int8_t __atomic_fetch_and_1 (int8_t *object, int8_t operand, memory_order 
order);
int16_t __atomic_fetch_and_2 (int16_t *object, int16_t operand, memory_order 
order);
int32_t __atomic_fetch_and_4 (int32_t *object, int32_t operand, memory_order 
order);
int64_t __atomic_fetch_and_8 (int64_t *object, int64_t operand, memory_order 
order);

    Atomically, replaces the value pointed to by object with the result of 
    bitwise and of the value pointed to by object and operand and returns 
    the value pointed to by object immediately before the effects. If object 
    is not aligned properly according to the type of object, the behavior 
    is undefined. The size of memory affected by the effects is designated 
    by the type of object.

    Memory is affected according to the value of order.

int8_t __atomic_or_fetch_1 (int8_t *object, int8_t operand, memory_order order);
int16_t __atomic_or_fetch_2 (int16_t *object, int16_t operand, memory_order 
order);
int32_t __atomic_or_fetch_4 (int32_t *object, int32_t operand, memory_order 
order);
int64_t __atomic_or_fetch_8 (int64_t *object, int64_t operand, memory_order 
order);

    Atomically, replaces the value pointed to by object with the result of 
    bitwise or of the value pointed to by object and operand and returns 
    the value pointed to by object immediately after the effects. If object 
    is not aligned properly according to the type of object, the behavior 
    is undefined. The size of memory affected by the effects is designated 
    by the type of object.

int8_t __atomic_fetch_or_1 (int8_t *object, int8_t operand, memory_order order);
int16_t __atomic_fetch_or_2 (int16_t *object, int16_t operand, memory_order 
order);
int32_t __atomic_fetch_or_4 (int32_t *object, int32_t operand, memory_order 
order);
int64_t __atomic_fetch_or_8 (int64_t *object, int64_t operand, memory_order 
order);

    Atomically, replaces the value pointed to by object with the result of 
    bitwise or of the value pointed to by object and operand and returns 
    the value pointed to by object immediately before the effects. If object 
    is not aligned properly according to the type of object, the behavior 
    is undefined. The size of memory affected by the effects is designated 
    by the type of object.

    Memory is affected according to the value of order.

int8_t __atomic_xor_fetch_1 (int8_t *object, int8_t operand, memory_order 
order);
int16_t __atomic_xor_fetch_2 (int16_t *object, int16_t operand, memory_order 
order);
int32_t __atomic_xor_fetch_4 (int32_t *object, int32_t operand, memory_order 
order);
int64_t __atomic_xor_fetch_8 (int64_t *object, int64_t operand, memory_order 
order);

    Atomically, replaces the value pointed to by object with the result of 
    bitwise xor of the value pointed to by object and operand and returns 
    the value pointed to by object immediately after the effects. If object 
    is not aligned properly according to the type of object, the behavior 
    is undefined. The size of memory affected by the effects is designated 
    by the type of object.

int8_t __atomic_fetch_xor_1 (int8_t *object, int8_t operand, memory_order 
order);
int16_t __atomic_fetch_xor_2 (int16_t *object, int16_t operand, memory_order 
order);
int32_t __atomic_fetch_xor_4 (int32_t *object, int32_t operand, memory_order 
order);
int64_t __atomic_fetch_xor_8 (int64_t *object, int64_t operand, memory_order 
order);

    Atomically, replaces the value pointed to by object with the result of 
    bitwise xor of the value pointed to by object and operand and returns 
    the value pointed to by object immediately before the effects. If object 
    is not aligned properly according to the type of object, the behavior 
    is undefined. The size of memory affected by the effects is designated 
    by the type of object.

    Memory is affected according to the value of order.

int8_t __atomic_nand_fetch_1 (int8_t *object, int8_t operand, memory_order 
order);
int16_t __atomic_nand_fetch_2 (int16_t *object, int16_t operand, memory_order 
order);
int32_t __atomic_nand_fetch_4 (int32_t *object, int32_t operand, memory_order 
order);
int64_t __atomic_nand_fetch_8 (int64_t *object, int64_t operand, memory_order 
order);

    Atomically, replaces the value pointed to by object with the result of 
    bitwise nand of the value pointed to by object and operand and returns 
    the value pointed to by object immediately after the effects. If object 
    is not aligned properly according to the type of object, the behavior 
    is undefined. The size of memory affected by the effects is designated 
    by the type of object.

    Bitwise operator nand is defined as the following using ANSI C 
    operators: a nand b is equivalent to ~(a & b).

int8_t __atomic_fetch_nand_1 (int8_t *object, int8_t operand, memory_order 
order);
int16_t __atomic_fetch_nand_2 (int16_t *object, int16_t operand, memory_order 
order);
int32_t __atomic_fetch_nand_4 (int32_t *object, int32_t operand, memory_order 
order);
int64_t __atomic_fetch_nand_8 (int64_t *object, int64_t operand, memory_order 
order);

    Atomically, replaces the value pointed to by object with the result of 
    bitwise nand of the value pointed to by object and operand and returns 
    the value pointed to by object immediately before the effects. If object 
    is not aligned properly according to the type of object, the behavior 
    is undefined. The size of memory affected by the effects is designated 
    by the type of object.

    Bitwise operator nand is defined as the following using ANSI C 
    operators: a nand b is equivalent to ~(a & b).

    Memory is affected according to the value of order.

_Bool __atomic_test_and_set_1 (int8_t *object, memory_order order);
_Bool __atomic_test_and_set_2 (int16_t *object, memory_order order);
_Bool __atomic_test_and_set_4 (int32_t *object, memory_order order)
_Bool __atomic_test_and_set_8 (int64_t *object, memory_order order)

    Atomically, checks the value pointed to by object and if it is in 
    the clear state, set the value pointed to by object to the set 
    state and returns true, and if it is in the set state, returns false. 
    The size of memory affected by the effects is always one byte.

    Memory is affected according to the value of order.

    The set and clear state are the same as specified for 
    atomic_flag_test_and_set.

_Bool __atomic_is_lock_free (size_t size, void *object);

    Returns whether the object pointed to by object is lock-free.
    The function assumes that the size of the object is size. If object 
    is NULL then the function assumes that object is aligned on an 
    size-byte address.

    The function takes the size of an object and an address which
    is one of the following three cases
    - the address of the object 
    - a faked address that solely indicates the alignment of the 
      object's address
    - NULL, which means that the alignment of the object matches size 
    and returns whether the object is lock-free.

void __atomic_feraiseexcept (int exception);

   Raise floating point exception(s) that specified by exception. 
   The int input argument exception represents a subset of 
   floating-point exceptions, and can be zero or the bitwise 
   OR of one or more floating-point exception macros. The macros
   are defined in fenv.h in section 4.1.

4.3. 64-bit Specific Interfaces

4.3.1. Data Representation of __int128 type

On x86 platforms, __int128 type is defined in the 64-bit ABI.

On SPARC platforms, the size and alignment of __int128 type is 
specified as the following:

             sizeof   Alignment
__int128       16        16     

4.3.2. Support Functions

The following functions are available only on 64-bit platforms. 

__int128 __atomic_load_16 (__int128 *object, memory_order order);
void __atomic_store_16 (__int128 *object, __int128 desired, memory_order order);
__int128 __atomic_exchange_16 (__int128 * object,  __int128 desired, 
memory_order order);
_Bool __atomic_compare_exchange_16 (__int128 *object, __int128 *expected, 
__int128 desired, memory_order success_order, memory_order failure_order);
__int128 __atomic_add_fetch_16 (__int128 *object, __int128 operand, 
memory_order order);
__int128 __atomic_fetch_add_16 (__int128 *object, __int128 operand, 
memory_order order);
__int128 __atomic_sub_fetch_16 (__int128 *object, __int128 operand, 
memory_order order);
__int128 __atomic_fetch_sub_16 (__int128 *object, __int128 operand, 
memory_order order);
__int128 __atomic_and_fetch_16 (__int128 *object, __int128 operand, 
memory_order order);
__int128 __atomic_fetch_and_16 (__int128 *object, __int128 operand, 
memory_order order);
__int128 __atomic_or_fetch_16 (__int128 *object, __int128 operand, memory_order 
order);
__int128 __atomic_fetch_or_16 (__int128 *object, __int128 operand, memory_order 
order);
__int128 __atomic_xor_fetch_16 (__int128 *object, __int128 operand, 
memory_order order);
__int128 __atomic_fetch_xor_16 (__int128 *object, __int128 operand, 
memory_order order);
__int128 __atomic_nand_fetch_16 (__int128 *object, __int128 operand, 
memory_order order);
__int128 __atomic_fetch_nand_16 (__int128 *object, __int128 operand, 
memory_order order);
_Bool __atomic_test_and_set_16 (__int128 *object, memory_order order);

The description of each function is the same with the corresponding
set of functions specified in section 4.2.

5. Libatomic symbol versioning

Here is the mapfile for symbol versioning of the libatomic library 
specified by this ABI specification

LIBATOMIC_1.0 {
  global:
    __atomic_load;
    __atomic_store;
    __atomic_exchange;
    __atomic_compare_exchange;
    __atomic_is_lock_free;

    __atomic_add_fetch_1;
    __atomic_add_fetch_2;
    __atomic_add_fetch_4;
    __atomic_add_fetch_8;
    __atomic_add_fetch_16;
    __atomic_and_fetch_1;
    __atomic_and_fetch_2;
    __atomic_and_fetch_4;
    __atomic_and_fetch_8;
    __atomic_and_fetch_16;
    __atomic_compare_exchange_1;
    __atomic_compare_exchange_2;
    __atomic_compare_exchange_4;
    __atomic_compare_exchange_8;
    __atomic_compare_exchange_16;
    __atomic_exchange_1;
    __atomic_exchange_2;
    __atomic_exchange_4;
    __atomic_exchange_8;
    __atomic_exchange_16;
    __atomic_fetch_add_1;
    __atomic_fetch_add_2;
    __atomic_fetch_add_4;
    __atomic_fetch_add_8;
    __atomic_fetch_add_16;
    __atomic_fetch_and_1;
    __atomic_fetch_and_2;
    __atomic_fetch_and_4;
    __atomic_fetch_and_8;
    __atomic_fetch_and_16;
    __atomic_fetch_nand_1;
    __atomic_fetch_nand_2;
    __atomic_fetch_nand_4;
    __atomic_fetch_nand_8;
    __atomic_fetch_nand_16;
    __atomic_fetch_or_1;
    __atomic_fetch_or_2;
    __atomic_fetch_or_4;
    __atomic_fetch_or_8;
    __atomic_fetch_or_16;
    __atomic_fetch_sub_1;
    __atomic_fetch_sub_2;
    __atomic_fetch_sub_4;
    __atomic_fetch_sub_8;
    __atomic_fetch_sub_16;
    __atomic_fetch_xor_1;
    __atomic_fetch_xor_2;
    __atomic_fetch_xor_4;
    __atomic_fetch_xor_8;
    __atomic_fetch_xor_16;
    __atomic_load_1;
    __atomic_load_2;
    __atomic_load_4;
    __atomic_load_8;
    __atomic_load_16;
    __atomic_nand_fetch_1;
    __atomic_nand_fetch_2;
    __atomic_nand_fetch_4;
    __atomic_nand_fetch_8;
    __atomic_nand_fetch_16;
    __atomic_or_fetch_1;
    __atomic_or_fetch_2;
    __atomic_or_fetch_4;
    __atomic_or_fetch_8;
    __atomic_or_fetch_16;
    __atomic_store_1;
    __atomic_store_2;
    __atomic_store_4;
    __atomic_store_8;
    __atomic_store_16;
    __atomic_sub_fetch_1;
    __atomic_sub_fetch_2;
    __atomic_sub_fetch_4;
    __atomic_sub_fetch_8;
    __atomic_sub_fetch_16;
    __atomic_test_and_set_1;
    __atomic_test_and_set_2;
    __atomic_test_and_set_4;
    __atomic_test_and_set_8;
    __atomic_test_and_set_16;
    __atomic_xor_fetch_1;
    __atomic_xor_fetch_2;
    __atomic_xor_fetch_4;
    __atomic_xor_fetch_8;
    __atomic_xor_fetch_16;

  local:
    *;
};
LIBATOMIC_1.1 {
  global:
    __atomic_feraiseexcept;
} LIBATOMIC_1.0;
LIBATOMIC_1.2 {
  global:
    atomic_thread_fence;
    atomic_signal_fence;
    atomic_flag_test_and_set;
    atomic_flag_test_and_set_explicit;
    atomic_flag_clear;
    atomic_flag_clear_explicit;
} LIBATOMIC_1.1;

6. Libatomic Assumption on Non-blocking Memory Instructions

libatomic assumes that programmers or compilers properly insert 
SFENCE/MFENCE barriers for the following cases

1) writes executed with CLFLUSH instruction
2) streaming loads/stores (V)MOVNTx, MASKMOVDQU, MASKMOVQ.
3) any other operations which reference Write Combining memory type.

Rationale

x86 has a strong memory model. Memory reads are not reordered with 
other reads, writes are not reordered with reads and other writes. 
The three cases mentioned are exceptions, i.e. those writes will not 
block other writes. 
The ABI specifies that code uses those non-blocking writes should 
contain proper fences, so that libatomic support functions do not need 
fences to synchronize with those instructions.

Appendix

A.1. Compatibility Notes

On 64-bit SPARC platforms, _Atomic long double is a 16-byte naturally 
aligned atomic type. There is no lock-free instruction for such type
in 64-bit SPARC ISA, and it is not inlineable in this ABI specification,
so the libatomic implementation have to use non-lock-free implementation
for atomic operations on such type. 

If in the future, lock-free instructions for 16-byte naturally aligned 
objects are available in a new SPARC ISA, then libatomic could leverage 
them to implement lock-free atomic operations for _Atomic long double.

This would be a backward compatible libatomic change. The type is not
inlineable, all atomic operations on objects of the type must be via
libatomic function calls, so all non-lock-free operations will be
changed to lock-free in those libatomic functions. 

However, if a compiler inlines an atomic operation on an _Atomic long 
double object and uses the new lock-free instructions, it could break 
the compatibility if the library implementation is still non-lock-free. 
In such case, the libatomic library and the compiler should be upgraded
in lock-step, and the inlineable property for certain atomic types
will be changed from false to true.

If a compiler change the data representation of atomic types, such
change will cause incompatible binary and it would be hard to detect
if the incompatible binaries are linked together.

References

[1] C11 Standard, 6.2.5p27
The size, representation, and alignment of an atomic type need not be 
the same as those of the corresponding unqualified type.

[2] C11 Standard, 7.17.6p1
For each line in the following table,257) the atomic type name is 
declared as a type that has the same representation and alignment 
requirements as the corresponding direct type.258)

Footnote 258 
258) The same representation and alignment requirements are meant to 
imply interchangeability as arguments to functions, return values from 
functions, and members of unions.

[3] C11 Standard, 6.7.2.1p5
A bit-field shall have a type that is a qualified or unqualified 
version of _Bool, signed int, unsigned int, or some other 
implementation-defined type. It is implementation-defined whether 
atomic types are permitted.

[4] C++11 Standard, 29.4p2
The function atomic_is_lock_free (29.6) indicates whether the object 
is lock-free. In any given program execution, the result of the 
lock-free query shall be consistent for all pointers of the same type.

[5] C11 Standard, 7.17.5.1p3
The atomic_is_lock_free generic function returns nonzero (true) if 
and only if the object's operations are lock-free. The result of a 
lock-free query on one object cannot be inferred from the result of 
a lock-free query on another object.

[6] http://www.open-std.org/jtc1/sc22/wg14/www/docs/summary.htm#dr_465

[7] C11 Standard, 6.7.2.4p3
The type name in an atomic type specifier shall not refer to an array 
type, a function type, an atomic type, or a qualified type.

[8] C11 Standard, 6.7.3p3
The type modified by the _Atomic qualifier shall not be an array type 
or a function type.

Re: Fwd: Re: GCC libatomic questions

Reply via email to