Re: cxx-mem-model merge [3 of 9] doc

Andrew MacLeod Fri, 04 Nov 2011 14:23:42 -0700

On 11/04/2011 04:34 PM, Joseph S. Myers wrote:


Likewise.

I think I got all those, plus a couple of more I noticed along the way.

        * doc/extend.texi: Document __atomic built-in functions.
        * doc/invoke.texi: Document data race parameters.
        * doc/md.texi: Document atomic patterns.

Index: extend.texi
===================================================================
*** extend.texi (revision 180839)
--- extend.texi (working copy)
*************** extensions, accepted by GCC in C90 mode 
*** 79,85 ****
  * Return Address::      Getting the return or frame address of a function.
  * Vector Extensions::   Using vector instructions through built-in functions.
  * Offsetof::            Special syntax for implementing @code{offsetof}.
! * Atomic Builtins::     Built-in functions for atomic memory access.
  * Object Size Checking:: Built-in functions for limited buffer overflow
                          checking.
  * Other Builtins::      Other built-in functions.
--- 79,86 ----
  * Return Address::      Getting the return or frame address of a function.
  * Vector Extensions::   Using vector instructions through built-in functions.
  * Offsetof::            Special syntax for implementing @code{offsetof}.
! * __sync Builtins::     Legacy built-in functions for atomic memory access.
! * __atomic Builtins::   Atomic built-in functions with memory model.
  * Object Size Checking:: Built-in functions for limited buffer overflow
                          checking.
  * Other Builtins::      Other built-in functions.
*************** is a suitable definition of the @code{of
*** 6682,6689 ****
  may be dependent.  In either case, @var{member} may consist of a single
  identifier, or a sequence of member accesses and array references.
  
! @node Atomic Builtins
! @section Built-in functions for atomic memory access
  
  The following builtins are intended to be compatible with those described
  in the @cite{Intel Itanium Processor-specific Application Binary Interface},
--- 6683,6690 ----
  may be dependent.  In either case, @var{member} may consist of a single
  identifier, or a sequence of member accesses and array references.
  
! @node __sync Builtins
! @section Legacy __sync built-in functions for atomic memory access
  
  The following builtins are intended to be compatible with those described
  in the @cite{Intel Itanium Processor-specific Application Binary Interface},
*************** previous memory loads have been satisfie
*** 6815,6820 ****
--- 6816,7053 ----
  are not prevented from being speculated to before the barrier.
  @end table
  
+ @node __atomic Builtins
+ @section Built-in functions for memory model aware atomic operations
+ 
+ The following built-in functions approximately match the requirements for
+ C++11 memory model. Many are similar to the @samp{__sync} prefixed built-in
+ functions, but all also have a memory model parameter.  These are all
+ identified by being prefixed with @samp{__atomic}, and most are overloaded
+ such that they work with multiple types.
+ 
+ GCC will allow any integral scalar or pointer type that is 1, 2, 4, or 8
+ bytes in length. 16-byte integral types are also allowed if
+ @samp{__int128} (@pxref{__int128}) is supported by the architecture.
+ 
+ Target architectures are encouraged to provide their own patterns for
+ each of these built-in functions.  If no target is provided, the original 
+ non-memory model set of @samp{__sync} atomic built-in functions will be
+ utilized, along with any required synchronization fences surrounding it in
+ order to achieve the proper behaviour.  Execution in this case is subject
+ to the same restrictions as those built-in functions.
+ 
+ If there is no pattern or mechanism to provide a lock free instruction
+ sequence, a call is made to an external routine with the same parameters
+ to be resolved at runtime.
+ 
+ The four non-arithmetic functions (load, store, exchange, and 
+ compare_exchange) all have a generic version as well.  This generic
+ version will work on any data type.  If the data type size maps to one
+ of the integral sizes which may have lock free support, the generic
+ version will utilize the lock free built-in function.  Otherwise an
+ external call is left to be resolved at runtime.  This external call will
+ be the same format with the addition of a @samp{size_t} parameter inserted
+ as the first parameter indicating the size of the object being pointed to.
+ All objects must be the same size.
+ 
+ There are 6 different memory models which can be specified.  These map
+ to the same names in the C++11 standard.  Refer there or to the
+ @uref{http://gcc.gnu.org/wiki/Atomic/GCCMM/AtomicSync,GCC wiki on
+ atomic synchronization} for more detailed definitions.  These memory
+ models integrate both barriers to code motion as well as synchronization
+ requirements with other threads. These are listed in approximately
+ ascending order of strength.
+ 
+ @table  @code
+ @item __ATOMIC_RELAXED
+ No barriers or synchronization.
+ @item __ATOMIC_CONSUME
+ Data dependency only for both barrier and synchronization with another
+ thread.
+ @item __ATOMIC_ACQUIRE
+ Barrier to hoisting of code and synchronizes with release (or stronger)
+ semantic stores from another thread.
+ @item __ATOMIC_RELEASE
+ Barrier to sinking of code and synchronizes with acquire (or stronger)
+ semantic loads from another thread.
+ @item __ATOMIC_ACQ_REL
+ Full barrier in both directions and synchronizes with acquire loads and
+ release stores in another thread.
+ @item __ATOMIC_SEQ_CST
+ Full barrier in both directions and synchronizes with acquire loads and
+ release stores in all threads.
+ @end table
+ 
+ When implementing patterns for these built-in functions , the memory model
+ parameter can be ignored as long as the pattern implements the most
+ restrictive @code{__ATOMIC_SEQ_CST} model.  Any of the other memory models
+ will execute correctly with this memory model but they may not execute as
+ efficiently as they could with a more appropriate implemention of the
+ relaxed requirements.
+ 
+ Note that the C++11 standard allows for the memory model parameter to be
+ determined at runtime rather than at compile time.  These built-in
+ functions will map any runtime value to @code{__ATOMIC_SEQ_CST} rather
+ than invoke a runtime library call or inline a switch statement.  This is
+ standard compliant, safe, and the simplest approach for now.
+ 
+ @deftypefn {Built-in Function} @var{type} __atomic_load_n (@var{type} *ptr, 
int memmodel)
+ This built-in function implements an atomic load operation.  It returns the
+ contents of @code{*@var{ptr}}.
+ 
+ The valid memory model variants are
+ @code{__ATOMIC_RELAXED}, @code{__ATOMIC_SEQ_CST}, @code{__ATOMIC_ACQUIRE},
+ and @code{__ATOMIC_CONSUME}.
+ 
+ @end deftypefn
+ 
+ @deftypefn {Built-in Function} void __atomic_load (@var{type} *ptr, 
@var{type} *ret, int memmodel)
+ This is the generic version of an atomic load.  It will return the
+ contents of @code{*@var{ptr}} in @code{*@var{ret}}.
+ 
+ @end deftypefn
+ 
+ @deftypefn {Built-in Function} void __atomic_store_n (@var{type} *ptr, 
@var{type} val, int memmodel)
+ This built-in function implements an atomic store operation.  It writes 
+ @code{@var{val}} into @code{*@var{ptr}}.  On targets which are limited,
+ 0 may be the only valid value. This mimics the behaviour of
+ @code{__sync_lock_release} on such hardware.
+ 
+ The valid memory model variants are
+ @code{__ATOMIC_RELAXED}, @code{__ATOMIC_SEQ_CST}, and @code{__ATOMIC_RELEASE}.
+ 
+ @end deftypefn
+ 
+ @deftypefn {Built-in Function} void __atomic_store (@var{type} *ptr, 
@var{type} *val, int memmodel)
+ This is the generic version of an atomic store.  It will store the value
+ of @code{*@var{val}} into @code{*@var{ptr}}.
+ 
+ @end deftypefn
+ 
+ @deftypefn {Built-in Function} @var{type} __atomic_exchange_n (@var{type} 
*ptr, @var{type} val, int memmodel)
+ This built-in function implements an atomic exchange operation.  It writes
+ @var{val} into @code{*@var{ptr}}, and returns the previous contents of
+ @code{*@var{ptr}}.
+ 
+ On targets which are limited, a value of 1 may be the only valid value
+ written.  This mimics the behaviour of @code{__sync_lock_test_and_set} on
+ such hardware.
+ 
+ The valid memory model variants are
+ @code{__ATOMIC_RELAXED}, @code{__ATOMIC_SEQ_CST}, @code{__ATOMIC_ACQUIRE},
+ @code{__ATOMIC_RELEASE}, and @code{__ATOMIC_ACQ_REL}.
+ 
+ @end deftypefn
+ 
+ @deftypefn {Built-in Function} void __atomic_exchange (@var{type} *ptr, 
@var{type} *val, @var{type} *ret, int memmodel)
+ This is the generic version of an atomic exchange.  It will store the
+ contents of @code{*@var{val}} into @code{*@var{ptr}}. The original value
+ of @code{*@var{ptr}} will be copied into @code{*@var{ret}}.
+ 
+ @end deftypefn
+ 
+ @deftypefn {Built-in Function} bool __atomic_compare_exchange_n (@var{type} 
*ptr, @var{type} *expected, @var{type} desired, bool weak, int 
success_memmodel, int failure_memmodel)
+ This built-in function implements an atomic compare and exchange operation.
+ This compares the contents of @code{*@var{ptr}} with the contents of
+ @code{*@var{expected}} and if equal, writes @var{desired} into
+ @code{*@var{ptr}}.  If they are not equal, the current contents of
+ @code{*@var{ptr}} is written into @code{*@var{expected}}.
+ 
+ True is returned if @code{*@var{desired}} is written into
+ @code{*@var{ptr}} and the execution is considered to conform to the
+ memory model specified by @var{success_memmodel}.  There are no
+ restrictions on what memory model can be used here.
+ 
+ False is returned otherwise, and the execution is considered to conform
+ to @var{failure_memmodel}. This memory model cannot be
+ @code{__ATOMIC_RELEASE} nor @code{__ATOMIC_ACQ_REL}.  It also cannot be a
+ stronger model than that specified by @var{success_memmodel}.
+ 
+ @end deftypefn
+ 
+ @deftypefn {Built-in Function} bool __atomic_compare_exchange (@var{type} 
*ptr, @var{type} *expected, @var{type} *desired, bool weak, int 
success_memmodel, int failure_memmodel)
+ This built-in function implements the generic version of
+ @code{__atomic_compare_exchange}.  The function is virtually identical to
+ @code{__atomic_compare_exchange_n}, except the desired value is also a
+ pointer.
+ 
+ @end deftypefn
+ 
+ @deftypefn {Built-in Function} @var{type} __atomic_add_fetch (@var{type} 
*ptr, @var{type} val, int memmodel)
+ @deftypefnx {Built-in Function} @var{type} __atomic_sub_fetch (@var{type} 
*ptr, @var{type} val, int memmodel)
+ @deftypefnx {Built-in Function} @var{type} __atomic_and_fetch (@var{type} 
*ptr, @var{type} val, int memmodel)
+ @deftypefnx {Built-in Function} @var{type} __atomic_xor_fetch (@var{type} 
*ptr, @var{type} val, int memmodel)
+ @deftypefnx {Built-in Function} @var{type} __atomic_or_fetch (@var{type} 
*ptr, @var{type} val, int memmodel)
+ @deftypefnx {Built-in Function} @var{type} __atomic_nand_fetch (@var{type} 
*ptr, @var{type} val, int memmodel)
+ These built-in functions perform the operation suggested by the name, and
+ return the result of the operation. That is,
+ 
+ @smallexample
+ @{ *ptr @var{op}= val; return *ptr; @}
+ @end smallexample
+ 
+ All memory models are valid.
+ 
+ @end deftypefn
+ 
+ @deftypefn {Built-in Function} @var{type} __atomic_fetch_add (@var{type} 
*ptr, @var{type} val, int memmodel)
+ @deftypefnx {Built-in Function} @var{type} __atomic_fetch_sub (@var{type} 
*ptr, @var{type} val, int memmodel)
+ @deftypefnx {Built-in Function} @var{type} __atomic_fetch_and (@var{type} 
*ptr, @var{type} val, int memmodel)
+ @deftypefnx {Built-in Function} @var{type} __atomic_fetch_xor (@var{type} 
*ptr, @var{type} val, int memmodel)
+ @deftypefnx {Built-in Function} @var{type} __atomic_fetch_or (@var{type} 
*ptr, @var{type} val, int memmodel)
+ @deftypefnx {Built-in Function} @var{type} __atomic_fetch_nand (@var{type} 
*ptr, @var{type} val, int memmodel)
+ These built-in functions perform the operation suggested by the name, and
+ return the value that had previously been in @code{*@var{ptr}}.  That is,
+ 
+ @smallexample
+ @{ tmp = *ptr; *ptr @var{op}= val; return tmp; @}
+ @end smallexample
+ 
+ All memory models are valid.
+ 
+ @end deftypefn
+ 
+ @deftypefn {Built-in Function} void __atomic_thread_fence (int memmodel)
+ 
+ This built-in function acts as a synchronization fence between threads
+ based on the specified memory model.
+ 
+ All memory orders are valid.
+ 
+ @end deftypefn
+ 
+ @deftypefn {Built-in Function} void __atomic_signal_fence (int memmodel)
+ 
+ This built-in function acts as a synchronization fence between a thread
+ and signal handlers based in the same thread.
+ 
+ All memory orders are valid.
+ 
+ @end deftypefn
+ 
+ @deftypefn {Built-in Function} bool __atomic_always_lock_free (size_t size)
+ 
+ This built-in function returns true if objects of size bytes will always
+ generate lock free atomic instructions for the target architecture.
+ Otherwise false is returned.
+ 
+ size must resolve to a compile time constant.
+ 
+ @smallexample
+ if (_atomic_always_lock_free (sizeof (long long)))
+ @end smallexample
+ 
+ @end deftypefn
+ 
+ @deftypefn {Built-in Function} bool __atomic_is_lock_free (size_t size)
+ 
+ This built-in function returns true if objects of size bytes will always
+ generate lock free atomic instructions for the target architecture.  If
+ it is not known to be lock free a call is made to a runtime routine named
+ @code{__atomic_is_lock_free}.
+ 
+ @end deftypefn
+ 
  @node Object Size Checking
  @section Object Size Checking Builtins
  @findex __builtin_object_size
Index: invoke.texi
===================================================================
*** invoke.texi (revision 180839)
--- invoke.texi (working copy)
*************** The maximum number of conditional stores
*** 9155,9165 ****
--- 9155,9180 ----
  if either vectorization (@option{-ftree-vectorize}) or if-conversion
  (@option{-ftree-loop-if-convert}) is disabled.  The default is 2.
  
+ @item allow-load-data-races
+ Allow optimizers to introduce new data races on loads.
+ Set to 1 to allow, otherwise to 0.  This option is enabled by default
+ unless implicitly set by the @option{-fmemory-model=} option.
+ 
  @item allow-store-data-races
  Allow optimizers to introduce new data races on stores.
  Set to 1 to allow, otherwise to 0.  This option is enabled by default
  unless implicitly set by the @option{-fmemory-model=} option.
  
+ @item allow-packed-load-data-races
+ Allow optimizers to introduce new data races on packed data loads.
+ Set to 1 to allow, otherwise to 0.  This option is enabled by default
+ unless implicitly set by the @option{-fmemory-model=} option.
+ 
+ @item allow-packed-store-data-races
+ Allow optimizers to introduce new data races on packed data stores.
+ Set to 1 to allow, otherwise to 0.  This option is enabled by default
+ unless implicitly set by the @option{-fmemory-model=} option.
+ 
  @item case-values-threshold
  The smallest number of different values for which it is best to use a
  jump-table instead of a tree of conditional branches.  If the value is
*************** This option will enable GCC to use CMPXC
*** 13016,13022 ****
  CMPXCHG16B allows for atomic operations on 128-bit double quadword (or oword)
  data types.  This is useful for high resolution counters that could be updated
  by multiple processors (or cores).  This instruction is generated as part of
! atomic built-in functions: see @ref{Atomic Builtins} for details.
  
  @item -msahf
  @opindex msahf
--- 13031,13038 ----
  CMPXCHG16B allows for atomic operations on 128-bit double quadword (or oword)
  data types.  This is useful for high resolution counters that could be updated
  by multiple processors (or cores).  This instruction is generated as part of
! atomic built-in functions: see @ref{__sync Builtins} or
! @ref{__atomic Builtins} for details.
  
  @item -msahf
  @opindex msahf
Index: md.texi
===================================================================
*** md.texi     (revision 180839)
--- md.texi     (working copy)
*************** released only after all previous memory 
*** 5628,5633 ****
--- 5628,5782 ----
  If this pattern is not defined, then a @code{memory_barrier} pattern
  will be emitted, followed by a store of the value to the memory operand.
  
+ @cindex @code{atomic_compare_and_swap@var{mode}} instruction pattern
+ @item @samp{atomic_compare_and_swap@var{mode}} 
+ This pattern, if defined, emits code for an atomic compare-and-swap
+ operation with memory model semantics.  Operand 2 is the memory on which
+ the atomic operation is performed.  Operand 0 is an output operand which
+ is set to true or false based on whether the operation succeeded.  Operand
+ 1 is an output operand which is set to the contents of the memory before
+ the operation was attempted.  Operand 3 is the value that is expected to
+ be in memory.  Operand 4 is the value to put in memory if the expected
+ value is found there.  Operand 5 is set to 1 if this compare and swap is to
+ be treated as a weak operation.  Operand 6 is the memory model to be used
+ if the operation is a success.  Operand 7 is the memory model to be used
+ if the operation fails.
+ 
+ If memory referred to in operand 2 contains the value in operand 3, then
+ operand 4 is stored in memory pointed to by operand 2 and fencing based on
+ the memory model in operand 6 is issued.  
+ 
+ If memory referred to in operand 2 does not contain the value in operand 3,
+ then fencing based on the memory model in operand 7 is issued.
+ 
+ If a target does not support weak compare-and-swap operations, or the port
+ elects not to implement weak operations, the argument in operand 5 can be
+ ignored.  Note a strong implementation must be provided.
+ 
+ If this pattern is not provided, the @code{__atomic_compare_exchange}
+ built-in functions will utilize the legacy @code{sync_compare_and_swap}
+ pattern with an @code{__ATOMIC_SEQ_CST} memory model.
+ 
+ @cindex @code{atomic_load@var{mode}} instruction pattern
+ @item @samp{atomic_load@var{mode}}
+ This pattern implements an atomic load operation with memory model
+ semantics.  Operand 1 is the memory address being loaded from.  Operand 0
+ is the result of the load.  Operand 2 is the memory model to be used for
+ the load operation.
+ 
+ If not present, the @code{__atomic_load} built-in function will either
+ resort to a normal load with memory barriers, or a compare-and-swap
+ operation if a normal load would not be atomic.
+ 
+ @cindex @code{atomic_store@var{mode}} instruction pattern
+ @item @samp{atomic_store@var{mode}}
+ This pattern implements an atomic store operation with memory model
+ semantics.  Operand 0 is the memory address being stored to.  Operand 1
+ is the value to be written.  Operand 2 is the memory model to be used for
+ the operation.
+ 
+ If not present, the @code{__atomic_store} built-in function will attempt to
+ perform a normal store and surround it with any required memory fences.  If
+ the store would not be atomic, then an @code{__atomic_exchange} is
+ attempted with the result being ignored.
+ 
+ @cindex @code{atomic_exchange@var{mode}} instruction pattern
+ @item @samp{atomic_exchange@var{mode}}
+ This pattern implements an atomic exchange operation with memory model
+ semantics.  Operand 1 is the memory location the operation is performed on.
+ Operand 0 is an output operand which is set to the original value contained
+ in the memory pointed to by operand 1.  Operand 2 is the value to be
+ stored.  Operand 3 is the memory model to be used.
+ 
+ If this pattern is not present, the built-in function
+ @code{__atomic_exchange} will attempt to preform the operation with a
+ compare and swap loop.
+ 
+ @cindex @code{atomic_add@var{mode}} instruction pattern
+ @cindex @code{atomic_sub@var{mode}} instruction pattern
+ @cindex @code{atomic_or@var{mode}} instruction pattern
+ @cindex @code{atomic_and@var{mode}} instruction pattern
+ @cindex @code{atomic_xor@var{mode}} instruction pattern
+ @cindex @code{atomic_nand@var{mode}} instruction pattern
+ @item @samp{atomic_add@var{mode}}, @samp{atomic_sub@var{mode}}
+ @itemx @samp{atomic_or@var{mode}}, @samp{atomic_and@var{mode}}
+ @itemx @samp{atomic_xor@var{mode}}, @samp{atomic_nand@var{mode}}
+ 
+ These patterns emit code for an atomic operation on memory with memory
+ model semantics. Operand 0 is the memory on which the atomic operation is
+ performed.  Operand 1 is the second operand to the binary operator.
+ Operand 2 is the memory model to be used by the operation.
+ 
+ If these patterns are not defined, attempts will be made to use legacy
+ @code{sync} patterns, or equivilent patterns which return a result.  If
+ none of these are available a compare-and-swap loop will be used.
+ 
+ @cindex @code{atomic_fetch_add@var{mode}} instruction pattern
+ @cindex @code{atomic_fetch_sub@var{mode}} instruction pattern
+ @cindex @code{atomic_fetch_or@var{mode}} instruction pattern
+ @cindex @code{atomic_fetch_and@var{mode}} instruction pattern
+ @cindex @code{atomic_fetch_xor@var{mode}} instruction pattern
+ @cindex @code{atomic_fetch_nand@var{mode}} instruction pattern
+ @item @samp{atomic_fetch_add@var{mode}}, @samp{atomic_fetch_sub@var{mode}}
+ @itemx @samp{atomic_fetch_or@var{mode}}, @samp{atomic_fetch_and@var{mode}}
+ @itemx @samp{atomic_fetch_xor@var{mode}}, @samp{atomic_fetch_nand@var{mode}}
+ 
+ These patterns emit code for an atomic operation on memory with memory
+ model semantics, and return the original value. Operand 0 is an output 
+ operand which contains the value of the memory location before the 
+ operation was performed.  Operand 1 is the memory on which the atomic 
+ operation is performed.  Operand 2 is the second operand to the binary
+ operator.  Operand 3 is the memory model to be used by the operation.
+ 
+ If these patterns are not defined, attempts will be made to use legacy
+ @code{sync} patterns.  If none of these are available a compare-and-swap
+ loop will be used.
+ 
+ @cindex @code{atomic_add_fetch@var{mode}} instruction pattern
+ @cindex @code{atomic_sub_fetch@var{mode}} instruction pattern
+ @cindex @code{atomic_or_fetch@var{mode}} instruction pattern
+ @cindex @code{atomic_and_fetch@var{mode}} instruction pattern
+ @cindex @code{atomic_xor_fetch@var{mode}} instruction pattern
+ @cindex @code{atomic_nand_fetch@var{mode}} instruction pattern
+ @item @samp{atomic_add_fetch@var{mode}}, @samp{atomic_sub_fetch@var{mode}}
+ @itemx @samp{atomic_or_fetch@var{mode}}, @samp{atomic_and_fetch@var{mode}}
+ @itemx @samp{atomic_xor_fetch@var{mode}}, @samp{atomic_nand_fetch@var{mode}}
+ 
+ These patterns emit code for an atomic operation on memory with memory
+ model semantics and return the result after the operation is performed.
+ Operand 0 is an output operand which contains the value after the
+ operation.  Operand 1 is the memory on which the atomic operation is
+ performed.  Operand 2 is the second operand to the binary operator.
+ Operand 3 is the memory model to be used by the operation.
+ 
+ If these patterns are not defined, attempts will be made to use legacy
+ @code{sync} patterns, or equivilent patterns which return the result before
+ the operation followed by the arithmetic operation required to produce the
+ result.  If none of these are available a compare-and-swap loop will be
+ used.
+ 
+ @cindex @code{mem_thread_fence@var{mode}} instruction pattern
+ @item @samp{mem_thread_fence@var{mode}}
+ This pattern emits code required to implement a thread fence with
+ memory model semantics.  Operand 0 is the memory model to be used.
+ 
+ If this pattern is not specified, all memory models except
+ @code{__ATOMIC_RELAXED} will result in issuing a @code{sync_synchronize}
+ barrier pattern.
+ 
+ @cindex @code{mem_signal_fence@var{mode}} instruction pattern
+ @item @samp{mem_signal_fence@var{mode}}
+ This pattern emits code required to implement a signal fence with
+ memory model semantics.  Operand 0 is the memory model to be used.
+ 
+ This pattern should impact the compiler optimizers the same way that
+ mem_signal_fence does, but it does not need to issue any barrier
+ instructions.
+ 
+ If this pattern is not specified, all memory models except
+ @code{__ATOMIC_RELAXED} will result in issuing a @code{sync_synchronize}
+ barrier pattern.
+ 
  @cindex @code{stack_protect_set} instruction pattern
  @item @samp{stack_protect_set}

Re: cxx-mem-model merge [3 of 9] doc

Reply via email to