On 11/04/2011 04:34 PM, Joseph S. Myers wrote:
Likewise.
I think I got all those, plus a couple of more I noticed along the way.
* doc/extend.texi: Document __atomic built-in functions. * doc/invoke.texi: Document data race parameters. * doc/md.texi: Document atomic patterns. Index: extend.texi =================================================================== *** extend.texi (revision 180839) --- extend.texi (working copy) *************** extensions, accepted by GCC in C90 mode *** 79,85 **** * Return Address:: Getting the return or frame address of a function. * Vector Extensions:: Using vector instructions through built-in functions. * Offsetof:: Special syntax for implementing @code{offsetof}. ! * Atomic Builtins:: Built-in functions for atomic memory access. * Object Size Checking:: Built-in functions for limited buffer overflow checking. * Other Builtins:: Other built-in functions. --- 79,86 ---- * Return Address:: Getting the return or frame address of a function. * Vector Extensions:: Using vector instructions through built-in functions. * Offsetof:: Special syntax for implementing @code{offsetof}. ! * __sync Builtins:: Legacy built-in functions for atomic memory access. ! * __atomic Builtins:: Atomic built-in functions with memory model. * Object Size Checking:: Built-in functions for limited buffer overflow checking. * Other Builtins:: Other built-in functions. *************** is a suitable definition of the @code{of *** 6682,6689 **** may be dependent. In either case, @var{member} may consist of a single identifier, or a sequence of member accesses and array references. ! @node Atomic Builtins ! @section Built-in functions for atomic memory access The following builtins are intended to be compatible with those described in the @cite{Intel Itanium Processor-specific Application Binary Interface}, --- 6683,6690 ---- may be dependent. In either case, @var{member} may consist of a single identifier, or a sequence of member accesses and array references. ! @node __sync Builtins ! @section Legacy __sync built-in functions for atomic memory access The following builtins are intended to be compatible with those described in the @cite{Intel Itanium Processor-specific Application Binary Interface}, *************** previous memory loads have been satisfie *** 6815,6820 **** --- 6816,7053 ---- are not prevented from being speculated to before the barrier. @end table + @node __atomic Builtins + @section Built-in functions for memory model aware atomic operations + + The following built-in functions approximately match the requirements for + C++11 memory model. Many are similar to the @samp{__sync} prefixed built-in + functions, but all also have a memory model parameter. These are all + identified by being prefixed with @samp{__atomic}, and most are overloaded + such that they work with multiple types. + + GCC will allow any integral scalar or pointer type that is 1, 2, 4, or 8 + bytes in length. 16-byte integral types are also allowed if + @samp{__int128} (@pxref{__int128}) is supported by the architecture. + + Target architectures are encouraged to provide their own patterns for + each of these built-in functions. If no target is provided, the original + non-memory model set of @samp{__sync} atomic built-in functions will be + utilized, along with any required synchronization fences surrounding it in + order to achieve the proper behaviour. Execution in this case is subject + to the same restrictions as those built-in functions. + + If there is no pattern or mechanism to provide a lock free instruction + sequence, a call is made to an external routine with the same parameters + to be resolved at runtime. + + The four non-arithmetic functions (load, store, exchange, and + compare_exchange) all have a generic version as well. This generic + version will work on any data type. If the data type size maps to one + of the integral sizes which may have lock free support, the generic + version will utilize the lock free built-in function. Otherwise an + external call is left to be resolved at runtime. This external call will + be the same format with the addition of a @samp{size_t} parameter inserted + as the first parameter indicating the size of the object being pointed to. + All objects must be the same size. + + There are 6 different memory models which can be specified. These map + to the same names in the C++11 standard. Refer there or to the + @uref{http://gcc.gnu.org/wiki/Atomic/GCCMM/AtomicSync,GCC wiki on + atomic synchronization} for more detailed definitions. These memory + models integrate both barriers to code motion as well as synchronization + requirements with other threads. These are listed in approximately + ascending order of strength. + + @table @code + @item __ATOMIC_RELAXED + No barriers or synchronization. + @item __ATOMIC_CONSUME + Data dependency only for both barrier and synchronization with another + thread. + @item __ATOMIC_ACQUIRE + Barrier to hoisting of code and synchronizes with release (or stronger) + semantic stores from another thread. + @item __ATOMIC_RELEASE + Barrier to sinking of code and synchronizes with acquire (or stronger) + semantic loads from another thread. + @item __ATOMIC_ACQ_REL + Full barrier in both directions and synchronizes with acquire loads and + release stores in another thread. + @item __ATOMIC_SEQ_CST + Full barrier in both directions and synchronizes with acquire loads and + release stores in all threads. + @end table + + When implementing patterns for these built-in functions , the memory model + parameter can be ignored as long as the pattern implements the most + restrictive @code{__ATOMIC_SEQ_CST} model. Any of the other memory models + will execute correctly with this memory model but they may not execute as + efficiently as they could with a more appropriate implemention of the + relaxed requirements. + + Note that the C++11 standard allows for the memory model parameter to be + determined at runtime rather than at compile time. These built-in + functions will map any runtime value to @code{__ATOMIC_SEQ_CST} rather + than invoke a runtime library call or inline a switch statement. This is + standard compliant, safe, and the simplest approach for now. + + @deftypefn {Built-in Function} @var{type} __atomic_load_n (@var{type} *ptr, int memmodel) + This built-in function implements an atomic load operation. It returns the + contents of @code{*@var{ptr}}. + + The valid memory model variants are + @code{__ATOMIC_RELAXED}, @code{__ATOMIC_SEQ_CST}, @code{__ATOMIC_ACQUIRE}, + and @code{__ATOMIC_CONSUME}. + + @end deftypefn + + @deftypefn {Built-in Function} void __atomic_load (@var{type} *ptr, @var{type} *ret, int memmodel) + This is the generic version of an atomic load. It will return the + contents of @code{*@var{ptr}} in @code{*@var{ret}}. + + @end deftypefn + + @deftypefn {Built-in Function} void __atomic_store_n (@var{type} *ptr, @var{type} val, int memmodel) + This built-in function implements an atomic store operation. It writes + @code{@var{val}} into @code{*@var{ptr}}. On targets which are limited, + 0 may be the only valid value. This mimics the behaviour of + @code{__sync_lock_release} on such hardware. + + The valid memory model variants are + @code{__ATOMIC_RELAXED}, @code{__ATOMIC_SEQ_CST}, and @code{__ATOMIC_RELEASE}. + + @end deftypefn + + @deftypefn {Built-in Function} void __atomic_store (@var{type} *ptr, @var{type} *val, int memmodel) + This is the generic version of an atomic store. It will store the value + of @code{*@var{val}} into @code{*@var{ptr}}. + + @end deftypefn + + @deftypefn {Built-in Function} @var{type} __atomic_exchange_n (@var{type} *ptr, @var{type} val, int memmodel) + This built-in function implements an atomic exchange operation. It writes + @var{val} into @code{*@var{ptr}}, and returns the previous contents of + @code{*@var{ptr}}. + + On targets which are limited, a value of 1 may be the only valid value + written. This mimics the behaviour of @code{__sync_lock_test_and_set} on + such hardware. + + The valid memory model variants are + @code{__ATOMIC_RELAXED}, @code{__ATOMIC_SEQ_CST}, @code{__ATOMIC_ACQUIRE}, + @code{__ATOMIC_RELEASE}, and @code{__ATOMIC_ACQ_REL}. + + @end deftypefn + + @deftypefn {Built-in Function} void __atomic_exchange (@var{type} *ptr, @var{type} *val, @var{type} *ret, int memmodel) + This is the generic version of an atomic exchange. It will store the + contents of @code{*@var{val}} into @code{*@var{ptr}}. The original value + of @code{*@var{ptr}} will be copied into @code{*@var{ret}}. + + @end deftypefn + + @deftypefn {Built-in Function} bool __atomic_compare_exchange_n (@var{type} *ptr, @var{type} *expected, @var{type} desired, bool weak, int success_memmodel, int failure_memmodel) + This built-in function implements an atomic compare and exchange operation. + This compares the contents of @code{*@var{ptr}} with the contents of + @code{*@var{expected}} and if equal, writes @var{desired} into + @code{*@var{ptr}}. If they are not equal, the current contents of + @code{*@var{ptr}} is written into @code{*@var{expected}}. + + True is returned if @code{*@var{desired}} is written into + @code{*@var{ptr}} and the execution is considered to conform to the + memory model specified by @var{success_memmodel}. There are no + restrictions on what memory model can be used here. + + False is returned otherwise, and the execution is considered to conform + to @var{failure_memmodel}. This memory model cannot be + @code{__ATOMIC_RELEASE} nor @code{__ATOMIC_ACQ_REL}. It also cannot be a + stronger model than that specified by @var{success_memmodel}. + + @end deftypefn + + @deftypefn {Built-in Function} bool __atomic_compare_exchange (@var{type} *ptr, @var{type} *expected, @var{type} *desired, bool weak, int success_memmodel, int failure_memmodel) + This built-in function implements the generic version of + @code{__atomic_compare_exchange}. The function is virtually identical to + @code{__atomic_compare_exchange_n}, except the desired value is also a + pointer. + + @end deftypefn + + @deftypefn {Built-in Function} @var{type} __atomic_add_fetch (@var{type} *ptr, @var{type} val, int memmodel) + @deftypefnx {Built-in Function} @var{type} __atomic_sub_fetch (@var{type} *ptr, @var{type} val, int memmodel) + @deftypefnx {Built-in Function} @var{type} __atomic_and_fetch (@var{type} *ptr, @var{type} val, int memmodel) + @deftypefnx {Built-in Function} @var{type} __atomic_xor_fetch (@var{type} *ptr, @var{type} val, int memmodel) + @deftypefnx {Built-in Function} @var{type} __atomic_or_fetch (@var{type} *ptr, @var{type} val, int memmodel) + @deftypefnx {Built-in Function} @var{type} __atomic_nand_fetch (@var{type} *ptr, @var{type} val, int memmodel) + These built-in functions perform the operation suggested by the name, and + return the result of the operation. That is, + + @smallexample + @{ *ptr @var{op}= val; return *ptr; @} + @end smallexample + + All memory models are valid. + + @end deftypefn + + @deftypefn {Built-in Function} @var{type} __atomic_fetch_add (@var{type} *ptr, @var{type} val, int memmodel) + @deftypefnx {Built-in Function} @var{type} __atomic_fetch_sub (@var{type} *ptr, @var{type} val, int memmodel) + @deftypefnx {Built-in Function} @var{type} __atomic_fetch_and (@var{type} *ptr, @var{type} val, int memmodel) + @deftypefnx {Built-in Function} @var{type} __atomic_fetch_xor (@var{type} *ptr, @var{type} val, int memmodel) + @deftypefnx {Built-in Function} @var{type} __atomic_fetch_or (@var{type} *ptr, @var{type} val, int memmodel) + @deftypefnx {Built-in Function} @var{type} __atomic_fetch_nand (@var{type} *ptr, @var{type} val, int memmodel) + These built-in functions perform the operation suggested by the name, and + return the value that had previously been in @code{*@var{ptr}}. That is, + + @smallexample + @{ tmp = *ptr; *ptr @var{op}= val; return tmp; @} + @end smallexample + + All memory models are valid. + + @end deftypefn + + @deftypefn {Built-in Function} void __atomic_thread_fence (int memmodel) + + This built-in function acts as a synchronization fence between threads + based on the specified memory model. + + All memory orders are valid. + + @end deftypefn + + @deftypefn {Built-in Function} void __atomic_signal_fence (int memmodel) + + This built-in function acts as a synchronization fence between a thread + and signal handlers based in the same thread. + + All memory orders are valid. + + @end deftypefn + + @deftypefn {Built-in Function} bool __atomic_always_lock_free (size_t size) + + This built-in function returns true if objects of size bytes will always + generate lock free atomic instructions for the target architecture. + Otherwise false is returned. + + size must resolve to a compile time constant. + + @smallexample + if (_atomic_always_lock_free (sizeof (long long))) + @end smallexample + + @end deftypefn + + @deftypefn {Built-in Function} bool __atomic_is_lock_free (size_t size) + + This built-in function returns true if objects of size bytes will always + generate lock free atomic instructions for the target architecture. If + it is not known to be lock free a call is made to a runtime routine named + @code{__atomic_is_lock_free}. + + @end deftypefn + @node Object Size Checking @section Object Size Checking Builtins @findex __builtin_object_size Index: invoke.texi =================================================================== *** invoke.texi (revision 180839) --- invoke.texi (working copy) *************** The maximum number of conditional stores *** 9155,9165 **** --- 9155,9180 ---- if either vectorization (@option{-ftree-vectorize}) or if-conversion (@option{-ftree-loop-if-convert}) is disabled. The default is 2. + @item allow-load-data-races + Allow optimizers to introduce new data races on loads. + Set to 1 to allow, otherwise to 0. This option is enabled by default + unless implicitly set by the @option{-fmemory-model=} option. + @item allow-store-data-races Allow optimizers to introduce new data races on stores. Set to 1 to allow, otherwise to 0. This option is enabled by default unless implicitly set by the @option{-fmemory-model=} option. + @item allow-packed-load-data-races + Allow optimizers to introduce new data races on packed data loads. + Set to 1 to allow, otherwise to 0. This option is enabled by default + unless implicitly set by the @option{-fmemory-model=} option. + + @item allow-packed-store-data-races + Allow optimizers to introduce new data races on packed data stores. + Set to 1 to allow, otherwise to 0. This option is enabled by default + unless implicitly set by the @option{-fmemory-model=} option. + @item case-values-threshold The smallest number of different values for which it is best to use a jump-table instead of a tree of conditional branches. If the value is *************** This option will enable GCC to use CMPXC *** 13016,13022 **** CMPXCHG16B allows for atomic operations on 128-bit double quadword (or oword) data types. This is useful for high resolution counters that could be updated by multiple processors (or cores). This instruction is generated as part of ! atomic built-in functions: see @ref{Atomic Builtins} for details. @item -msahf @opindex msahf --- 13031,13038 ---- CMPXCHG16B allows for atomic operations on 128-bit double quadword (or oword) data types. This is useful for high resolution counters that could be updated by multiple processors (or cores). This instruction is generated as part of ! atomic built-in functions: see @ref{__sync Builtins} or ! @ref{__atomic Builtins} for details. @item -msahf @opindex msahf Index: md.texi =================================================================== *** md.texi (revision 180839) --- md.texi (working copy) *************** released only after all previous memory *** 5628,5633 **** --- 5628,5782 ---- If this pattern is not defined, then a @code{memory_barrier} pattern will be emitted, followed by a store of the value to the memory operand. + @cindex @code{atomic_compare_and_swap@var{mode}} instruction pattern + @item @samp{atomic_compare_and_swap@var{mode}} + This pattern, if defined, emits code for an atomic compare-and-swap + operation with memory model semantics. Operand 2 is the memory on which + the atomic operation is performed. Operand 0 is an output operand which + is set to true or false based on whether the operation succeeded. Operand + 1 is an output operand which is set to the contents of the memory before + the operation was attempted. Operand 3 is the value that is expected to + be in memory. Operand 4 is the value to put in memory if the expected + value is found there. Operand 5 is set to 1 if this compare and swap is to + be treated as a weak operation. Operand 6 is the memory model to be used + if the operation is a success. Operand 7 is the memory model to be used + if the operation fails. + + If memory referred to in operand 2 contains the value in operand 3, then + operand 4 is stored in memory pointed to by operand 2 and fencing based on + the memory model in operand 6 is issued. + + If memory referred to in operand 2 does not contain the value in operand 3, + then fencing based on the memory model in operand 7 is issued. + + If a target does not support weak compare-and-swap operations, or the port + elects not to implement weak operations, the argument in operand 5 can be + ignored. Note a strong implementation must be provided. + + If this pattern is not provided, the @code{__atomic_compare_exchange} + built-in functions will utilize the legacy @code{sync_compare_and_swap} + pattern with an @code{__ATOMIC_SEQ_CST} memory model. + + @cindex @code{atomic_load@var{mode}} instruction pattern + @item @samp{atomic_load@var{mode}} + This pattern implements an atomic load operation with memory model + semantics. Operand 1 is the memory address being loaded from. Operand 0 + is the result of the load. Operand 2 is the memory model to be used for + the load operation. + + If not present, the @code{__atomic_load} built-in function will either + resort to a normal load with memory barriers, or a compare-and-swap + operation if a normal load would not be atomic. + + @cindex @code{atomic_store@var{mode}} instruction pattern + @item @samp{atomic_store@var{mode}} + This pattern implements an atomic store operation with memory model + semantics. Operand 0 is the memory address being stored to. Operand 1 + is the value to be written. Operand 2 is the memory model to be used for + the operation. + + If not present, the @code{__atomic_store} built-in function will attempt to + perform a normal store and surround it with any required memory fences. If + the store would not be atomic, then an @code{__atomic_exchange} is + attempted with the result being ignored. + + @cindex @code{atomic_exchange@var{mode}} instruction pattern + @item @samp{atomic_exchange@var{mode}} + This pattern implements an atomic exchange operation with memory model + semantics. Operand 1 is the memory location the operation is performed on. + Operand 0 is an output operand which is set to the original value contained + in the memory pointed to by operand 1. Operand 2 is the value to be + stored. Operand 3 is the memory model to be used. + + If this pattern is not present, the built-in function + @code{__atomic_exchange} will attempt to preform the operation with a + compare and swap loop. + + @cindex @code{atomic_add@var{mode}} instruction pattern + @cindex @code{atomic_sub@var{mode}} instruction pattern + @cindex @code{atomic_or@var{mode}} instruction pattern + @cindex @code{atomic_and@var{mode}} instruction pattern + @cindex @code{atomic_xor@var{mode}} instruction pattern + @cindex @code{atomic_nand@var{mode}} instruction pattern + @item @samp{atomic_add@var{mode}}, @samp{atomic_sub@var{mode}} + @itemx @samp{atomic_or@var{mode}}, @samp{atomic_and@var{mode}} + @itemx @samp{atomic_xor@var{mode}}, @samp{atomic_nand@var{mode}} + + These patterns emit code for an atomic operation on memory with memory + model semantics. Operand 0 is the memory on which the atomic operation is + performed. Operand 1 is the second operand to the binary operator. + Operand 2 is the memory model to be used by the operation. + + If these patterns are not defined, attempts will be made to use legacy + @code{sync} patterns, or equivilent patterns which return a result. If + none of these are available a compare-and-swap loop will be used. + + @cindex @code{atomic_fetch_add@var{mode}} instruction pattern + @cindex @code{atomic_fetch_sub@var{mode}} instruction pattern + @cindex @code{atomic_fetch_or@var{mode}} instruction pattern + @cindex @code{atomic_fetch_and@var{mode}} instruction pattern + @cindex @code{atomic_fetch_xor@var{mode}} instruction pattern + @cindex @code{atomic_fetch_nand@var{mode}} instruction pattern + @item @samp{atomic_fetch_add@var{mode}}, @samp{atomic_fetch_sub@var{mode}} + @itemx @samp{atomic_fetch_or@var{mode}}, @samp{atomic_fetch_and@var{mode}} + @itemx @samp{atomic_fetch_xor@var{mode}}, @samp{atomic_fetch_nand@var{mode}} + + These patterns emit code for an atomic operation on memory with memory + model semantics, and return the original value. Operand 0 is an output + operand which contains the value of the memory location before the + operation was performed. Operand 1 is the memory on which the atomic + operation is performed. Operand 2 is the second operand to the binary + operator. Operand 3 is the memory model to be used by the operation. + + If these patterns are not defined, attempts will be made to use legacy + @code{sync} patterns. If none of these are available a compare-and-swap + loop will be used. + + @cindex @code{atomic_add_fetch@var{mode}} instruction pattern + @cindex @code{atomic_sub_fetch@var{mode}} instruction pattern + @cindex @code{atomic_or_fetch@var{mode}} instruction pattern + @cindex @code{atomic_and_fetch@var{mode}} instruction pattern + @cindex @code{atomic_xor_fetch@var{mode}} instruction pattern + @cindex @code{atomic_nand_fetch@var{mode}} instruction pattern + @item @samp{atomic_add_fetch@var{mode}}, @samp{atomic_sub_fetch@var{mode}} + @itemx @samp{atomic_or_fetch@var{mode}}, @samp{atomic_and_fetch@var{mode}} + @itemx @samp{atomic_xor_fetch@var{mode}}, @samp{atomic_nand_fetch@var{mode}} + + These patterns emit code for an atomic operation on memory with memory + model semantics and return the result after the operation is performed. + Operand 0 is an output operand which contains the value after the + operation. Operand 1 is the memory on which the atomic operation is + performed. Operand 2 is the second operand to the binary operator. + Operand 3 is the memory model to be used by the operation. + + If these patterns are not defined, attempts will be made to use legacy + @code{sync} patterns, or equivilent patterns which return the result before + the operation followed by the arithmetic operation required to produce the + result. If none of these are available a compare-and-swap loop will be + used. + + @cindex @code{mem_thread_fence@var{mode}} instruction pattern + @item @samp{mem_thread_fence@var{mode}} + This pattern emits code required to implement a thread fence with + memory model semantics. Operand 0 is the memory model to be used. + + If this pattern is not specified, all memory models except + @code{__ATOMIC_RELAXED} will result in issuing a @code{sync_synchronize} + barrier pattern. + + @cindex @code{mem_signal_fence@var{mode}} instruction pattern + @item @samp{mem_signal_fence@var{mode}} + This pattern emits code required to implement a signal fence with + memory model semantics. Operand 0 is the memory model to be used. + + This pattern should impact the compiler optimizers the same way that + mem_signal_fence does, but it does not need to issue any barrier + instructions. + + If this pattern is not specified, all memory models except + @code{__ATOMIC_RELAXED} will result in issuing a @code{sync_synchronize} + barrier pattern. + @cindex @code{stack_protect_set} instruction pattern @item @samp{stack_protect_set}