On Tue, Jul 30, 2019 at 08:00:57PM -0300, Mauro Carvalho Chehab wrote:
> Em Tue, 30 Jul 2019 18:21:30 -0400
> Joel Fernandes <j...@joelfernandes.org> escreveu:
> 
> > On Tue, Jul 30, 2019 at 07:00:28PM -0300, Mauro Carvalho Chehab wrote:
> > > Em Tue, 30 Jul 2019 17:50:07 -0400
> > > Joel Fernandes <j...@joelfernandes.org> escreveu:
> > >   
> > > > On Tue, Jul 30, 2019 at 02:22:50PM -0700, Paul E. McKenney wrote:  
> > > > > On Fri, Jul 26, 2019 at 09:51:35AM -0300, Mauro Carvalho Chehab 
> > > > > wrote:    
> > > > > > There are 4 RCU articles that are written on html format.
> > > > > > 
> > > > > > The way they are, they can't be part of the Linux Kernel
> > > > > > documentation body nor share the styles and pdf output.
> > > > > > 
> > > > > > So, convert them to ReST format.
> > > > > > 
> > > > > > This way, make htmldocs and make pdfdocs will produce a
> > > > > > documentation output that will be like the original ones, but
> > > > > > will be part of the Linux Kernel documentation body.
> > > > > > 
> > > > > > Part of the conversion was done with the help of pandoc, but
> > > > > > the result had some broken things that had to be manually
> > > > > > fixed.
> > > > > > 
> > > > > > Signed-off-by: Mauro Carvalho Chehab <mchehab+sams...@kernel.org>   
> > > > > >  
> > > > > 
> > > > > I am having some trouble applying these, at least in part due to UTF-8
> > > > > sequences, for example double left quotation mark.  These end up being
> > > > > "=E2=80=9C", with a few space characters turned into "=20".
> > > > > 
> > > > > Any advice on how to apply these?  Should I just pull commits from
> > > > > somewhere?    
> > > > 
> > > > I was able to successfully apply and build this particular patch. I 
> > > > think
> > > > this is the only one in the series that applies to RCU.
> > > > 
> > > > Sadly, I can't find the patch in any of the public archives, but I could
> > > > perhaps email it to you as an .mbox attach which 'git am' should be 
> > > > able to
> > > > apply.
> > > > 
> > > > Mauro did say he was going to add some more details to changelog, or it 
> > > > could
> > > > be added when it is applied:
> > > > https://lore.kernel.org/rcu/20190726154550.5eeae...@coco.lan/  
> > > 
> > > Yeah, I'm also planning to address at least some of the issues you
> > > pointed, in order to improve the html output, but got sidetracked by 
> > > something 
> > > else and didn't find any time yet to finish. I'm adding some CI 
> > > automation for
> > > the media subsystem in order to help us dealing with the huge amount of 
> > > patches
> > > we receive there.
> > > 
> > > Feel free to add those details to the changelog. I may find some spare 
> > > time 
> > > this week or the next one for the improvements you suggested, but those 
> > > could be sent on followup patches, once done.  
> > 
> > Ok, I will re-send this RCU patch with the changes, leave this one to me.
> > 
> > The other memory model one, needs a lot more work so we can keep that aside
> > for now till someone has the time.
> 
> Yeah, feel free to do what fits best with regards to this one.
> 
> If you prefer to merge it instead, I'm enclosing the last version of it,
> with that quick hack I just wrote, plus the change of the toctree for
> it to produce the 2-level index.

Ok thanks a lot Mauro! It looks much better to me now. I can probably send
this one out too to Paul after the RCU ones are in, after some more careful
review..

However, I am curious what Paul and Alan think about it since some folks such
as PeterZ actively deal with memory model stuff and probably refer a lot to
this document, but don't like ReST (unlike us) and prefer plain text editors
without any markup. Hopefully this is not the case any more..

thanks,

 - Joel


> Thanks,
> Mauro
> 
> [PATCH] tools: memory-model: add it to the Documentation body
> 
> The books at tools/memory-model/Documentation are very well
> formatted. Congrats to the ones that wrote them!
> 
> The manual conversion to ReST is really trivial:
> 
>       - Add document titles;
>       - change the bullets on some lists;
>       - mark code blocks.
> 
> Signed-off-by: Mauro Carvalho Chehab <mchehab+sams...@kernel.org>
> 
> diff --git a/Documentation/core-api/refcount-vs-atomic.rst 
> b/Documentation/core-api/refcount-vs-atomic.rst
> index 976e85adffe8..7e6500a6fa76 100644
> --- a/Documentation/core-api/refcount-vs-atomic.rst
> +++ b/Documentation/core-api/refcount-vs-atomic.rst
> @@ -17,7 +17,7 @@ in order to help maintainers validate their code against 
> the change in
>  these memory ordering guarantees.
>  
>  The terms used through this document try to follow the formal LKMM defined in
> -tools/memory-model/Documentation/explanation.txt.
> +tools/memory-model/Documentation/explanation.rst.
>  
>  memory-barriers.txt and atomic_t.txt provide more background to the
>  memory ordering in general and for atomic operations specifically.
> diff --git a/Documentation/index.rst b/Documentation/index.rst
> index 2df5a3da563c..5123770ba77e 100644
> --- a/Documentation/index.rst
> +++ b/Documentation/index.rst
> @@ -36,6 +36,7 @@ trying to get it to work optimally on a given system.
>  
>     admin-guide/index
>     kbuild/index
> +   tools/index
>  
>  Firmware-related documentation
>  ------------------------------
> diff --git a/Documentation/tools/index.rst b/Documentation/tools/index.rst
> new file mode 100644
> index 000000000000..f540d9cc75a1
> --- /dev/null
> +++ b/Documentation/tools/index.rst
> @@ -0,0 +1,17 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +===========
> +Linux Tools
> +===========
> +
> +.. toctree::
> +   :maxdepth: 2
> +
> +   memory-model/index
> +
> +.. only::  subproject and html
> +
> +   Indices
> +   =======
> +
> +   * :ref:`genindex`
> diff --git a/Documentation/tools/memory-model 
> b/Documentation/tools/memory-model
> new file mode 120000
> index 000000000000..020ed38ce302
> --- /dev/null
> +++ b/Documentation/tools/memory-model
> @@ -0,0 +1 @@
> +../../tools/memory-model/Documentation/
> \ No newline at end of file
> diff --git a/tools/memory-model/Documentation/cheatsheet.rst 
> b/tools/memory-model/Documentation/cheatsheet.rst
> new file mode 100644
> index 000000000000..35f8749b8a53
> --- /dev/null
> +++ b/tools/memory-model/Documentation/cheatsheet.rst
> @@ -0,0 +1,36 @@
> +===========
> +Cheat Sheet
> +===========
> +
> +::
> +
> +                                 Prior Operation     Subsequent Operation
> +                                 ---------------  ---------------------------
> +                              C  Self  R  W  RMW  Self  R  W  DR  DW  RMW  SV
> +                             --  ----  -  -  ---  ----  -  -  --  --  ---  --
> +
> +  Store, e.g., WRITE_ONCE()            Y                                     
>   Y
> +  Load, e.g., READ_ONCE()              Y                          Y   Y      
>   Y
> +  Unsuccessful RMW operation           Y                          Y   Y      
>   Y
> +  rcu_dereference()                    Y                          Y   Y      
>   Y
> +  Successful *_acquire()               R                   Y  Y   Y   Y    Y 
>   Y
> +  Successful *_release()         C        Y  Y    Y     W                    
>   Y
> +  smp_rmb()                               Y       R        Y      Y        R
> +  smp_wmb()                                  Y    W           Y       Y    W
> +  smp_mb() & synchronize_rcu()  CP        Y  Y    Y        Y  Y   Y   Y    Y
> +  Successful full non-void RMW  CP     Y  Y  Y    Y     Y  Y  Y   Y   Y    Y 
>   Y
> +  smp_mb__before_atomic()       CP        Y  Y    Y        a  a   a   a    Y
> +  smp_mb__after_atomic()        CP        a  a    Y        Y  Y   Y   Y    Y
> +
> +
> +  Key:    C:      Ordering is cumulative
> +       P:      Ordering propagates
> +       R:      Read, for example, READ_ONCE(), or read portion of RMW
> +       W:      Write, for example, WRITE_ONCE(), or write portion of RMW
> +       Y:      Provides ordering
> +       a:      Provides ordering given intervening RMW atomic operation
> +       DR:     Dependent read (address dependency)
> +       DW:     Dependent write (address, data, or control dependency)
> +       RMW:    Atomic read-modify-write operation
> +       SELF:   Orders self, as opposed to accesses before and/or after
> +       SV:     Orders later accesses to the same variable
> diff --git a/tools/memory-model/Documentation/cheatsheet.txt 
> b/tools/memory-model/Documentation/cheatsheet.txt
> deleted file mode 100644
> index 33ba98d72b16..000000000000
> --- a/tools/memory-model/Documentation/cheatsheet.txt
> +++ /dev/null
> @@ -1,30 +0,0 @@
> -                                  Prior Operation     Subsequent Operation
> -                                  ---------------  
> ---------------------------
> -                               C  Self  R  W  RMW  Self  R  W  DR  DW  RMW  
> SV
> -                              --  ----  -  -  ---  ----  -  -  --  --  ---  
> --
> -
> -Store, e.g., WRITE_ONCE()            Y                                       
> Y
> -Load, e.g., READ_ONCE()              Y                          Y   Y        
> Y
> -Unsuccessful RMW operation           Y                          Y   Y        
> Y
> -rcu_dereference()                    Y                          Y   Y        
> Y
> -Successful *_acquire()               R                   Y  Y   Y   Y    Y   
> Y
> -Successful *_release()         C        Y  Y    Y     W                      
> Y
> -smp_rmb()                               Y       R        Y      Y        R
> -smp_wmb()                                  Y    W           Y       Y    W
> -smp_mb() & synchronize_rcu()  CP        Y  Y    Y        Y  Y   Y   Y    Y
> -Successful full non-void RMW  CP     Y  Y  Y    Y     Y  Y  Y   Y   Y    Y   
> Y
> -smp_mb__before_atomic()       CP        Y  Y    Y        a  a   a   a    Y
> -smp_mb__after_atomic()        CP        a  a    Y        Y  Y   Y   Y    Y
> -
> -
> -Key: C:      Ordering is cumulative
> -     P:      Ordering propagates
> -     R:      Read, for example, READ_ONCE(), or read portion of RMW
> -     W:      Write, for example, WRITE_ONCE(), or write portion of RMW
> -     Y:      Provides ordering
> -     a:      Provides ordering given intervening RMW atomic operation
> -     DR:     Dependent read (address dependency)
> -     DW:     Dependent write (address, data, or control dependency)
> -     RMW:    Atomic read-modify-write operation
> -     SELF:   Orders self, as opposed to accesses before and/or after
> -     SV:     Orders later accesses to the same variable
> diff --git a/tools/memory-model/Documentation/explanation.txt 
> b/tools/memory-model/Documentation/explanation.rst
> similarity index 92%
> rename from tools/memory-model/Documentation/explanation.txt
> rename to tools/memory-model/Documentation/explanation.rst
> index 68caa9a976d0..9b5d10cef0c2 100644
> --- a/tools/memory-model/Documentation/explanation.txt
> +++ b/tools/memory-model/Documentation/explanation.rst
> @@ -1,5 +1,6 @@
> +========================================================
>  Explanation of the Linux-Kernel Memory Consistency Model
> -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +========================================================
>  
>  :Author: Alan Stern <st...@rowland.harvard.edu>
>  :Created: October 2017
> @@ -107,7 +108,7 @@ and the flag are memory locations shared between the two 
> CPUs.
>  
>  We can abstract out the important pieces of the driver code as follows
>  (the reason for using WRITE_ONCE() and READ_ONCE() instead of simple
> -assignment statements is discussed later):
> +assignment statements is discussed later)::
>  
>       int buf = 0, flag = 0;
>  
> @@ -219,7 +220,7 @@ Ordering).  This model predicts that the undesired 
> outcome for the MP
>  pattern cannot occur, but in other respects it differs from Sequential
>  Consistency.  One example is the Store Buffer (SB) pattern, in which
>  each CPU stores to its own shared location and then loads from the
> -other CPU's location:
> +other CPU's location::
>  
>       int x = 0, y = 0;
>  
> @@ -264,7 +265,7 @@ The counterpart to ordering is a cycle.  Ordering rules 
> out cycles:
>  It's not possible to have X ordered before Y, Y ordered before Z, and
>  Z ordered before X, because this would mean that X is ordered before
>  itself.  The analysis of the MP example under Sequential Consistency
> -involved just such an impossible cycle:
> +involved just such an impossible cycle::
>  
>       W: P0 stores 1 to flag   executes before
>       X: P1 loads 1 from flag  executes before
> @@ -331,7 +332,7 @@ can think of it as the order in which statements occur in 
> the source
>  code after branches are taken into account and loops have been
>  unrolled.  A better description might be the order in which
>  instructions are presented to a CPU's execution unit.  Thus, we say
> -that X is po-before Y (written as "X ->po Y" in formulas) if X occurs
> +that X is po-before Y (written as ``X ->po Y`` in formulas) if X occurs
>  before Y in the instruction stream.
>  
>  This is inherently a single-CPU relation; two instructions executing
> @@ -388,7 +389,7 @@ The protections provided by READ_ONCE(), WRITE_ONCE(), 
> and others are
>  not perfect; and under some circumstances it is possible for the
>  compiler to undermine the memory model.  Here is an example.  Suppose
>  both branches of an "if" statement store the same value to the same
> -location:
> +location::
>  
>       r1 = READ_ONCE(x);
>       if (r1) {
> @@ -402,7 +403,7 @@ location:
>  For this code, the LKMM predicts that the load from x will always be
>  executed before either of the stores to y.  However, a compiler could
>  lift the stores out of the conditional, transforming the code into
> -something resembling:
> +something resembling::
>  
>       r1 = READ_ONCE(x);
>       WRITE_ONCE(y, 2);
> @@ -418,7 +419,7 @@ model's original prediction could be invalidated by the 
> compiler.
>  
>  Another issue arises from the fact that in C, arguments to many
>  operators and function calls can be evaluated in any order.  For
> -example:
> +example::
>  
>       r1 = f(5) + g(6);
>  
> @@ -440,7 +441,7 @@ control (ctrl).
>  
>  A read and a write event are linked by a data dependency if the value
>  obtained by the read affects the value stored by the write.  As a very
> -simple example:
> +simple example::
>  
>       int x, y;
>  
> @@ -455,7 +456,7 @@ values of multiple reads.
>  A read event and another memory access event are linked by an address
>  dependency if the value obtained by the read affects the location
>  accessed by the other event.  The second event can be either a read or
> -a write.  Here's another simple example:
> +a write.  Here's another simple example::
>  
>       int a[20];
>       int i;
> @@ -471,7 +472,7 @@ pointer.
>  
>  Finally, a read event and another memory access event are linked by a
>  control dependency if the value obtained by the read affects whether
> -the second event is executed at all.  Simple example:
> +the second event is executed at all.  Simple example::
>  
>       int x, y;
>  
> @@ -484,9 +485,9 @@ which depends on the value obtained by the READ_ONCE(); 
> hence there is
>  a control dependency from the load to the store.
>  
>  It should be pretty obvious that events can only depend on reads that
> -come earlier in program order.  Symbolically, if we have R ->data X,
> -R ->addr X, or R ->ctrl X (where R is a read event), then we must also
> -have R ->po X.  It wouldn't make sense for a computation to depend
> +come earlier in program order.  Symbolically, if we have ``R ->data X``,
> +``R ->addr X``, or ``R ->ctrl X`` (where R is a read event), then we must 
> also
> +have ``R ->po X``.  It wouldn't make sense for a computation to depend
>  somehow on a value that doesn't get loaded from shared memory until
>  later in the code!
>  
> @@ -497,7 +498,7 @@ THE READS-FROM RELATION: rf, rfi, and rfe
>  The reads-from relation (rf) links a write event to a read event when
>  the value loaded by the read is the value that was stored by the
>  write.  In colloquial terms, the load "reads from" the store.  We
> -write W ->rf R to indicate that the load R reads from the store W.  We
> +write ``W ->rf R`` to indicate that the load R reads from the store W.  We
>  further distinguish the cases where the load and the store occur on
>  the same CPU (internal reads-from, or rfi) and where they occur on
>  different CPUs (external reads-from, or rfe).
> @@ -510,7 +511,7 @@ Usage of the rf relation implicitly assumes that loads 
> will always
>  read from a single store.  It doesn't apply properly in the presence
>  of load-tearing, where a load obtains some of its bits from one store
>  and some of them from another store.  Fortunately, use of READ_ONCE()
> -and WRITE_ONCE() will prevent load-tearing; it's not possible to have:
> +and WRITE_ONCE() will prevent load-tearing; it's not possible to have::
>  
>       int x = 0;
>  
> @@ -530,7 +531,7 @@ and end up with r1 = 0x1200 (partly from x's initial 
> value and partly
>  from the value stored by P0).
>  
>  On the other hand, load-tearing is unavoidable when mixed-size
> -accesses are used.  Consider this example:
> +accesses are used.  Consider this example::
>  
>       union {
>               u32     w;
> @@ -578,26 +579,26 @@ that value comes third, and so on.
>  You can think of the coherence order as being the order in which the
>  stores reach x's location in memory (or if you prefer a more
>  hardware-centric view, the order in which the stores get written to
> -x's cache line).  We write W ->co W' if W comes before W' in the
> +x's cache line).  We write ``W ->co W'`` if W comes before W' in the
>  coherence order, that is, if the value stored by W gets overwritten,
>  directly or indirectly, by the value stored by W'.
>  
>  Coherence order is required to be consistent with program order.  This
>  requirement takes the form of four coherency rules:
>  
> -     Write-write coherence: If W ->po-loc W' (i.e., W comes before
> +     Write-write coherence: If ``W ->po-loc W'`` (i.e., W comes before
>       W' in program order and they access the same location), where W
> -     and W' are two stores, then W ->co W'.
> +     and W' are two stores, then ``W ->co W'``.
>  
> -     Write-read coherence: If W ->po-loc R, where W is a store and R
> +     Write-read coherence: If ``W ->po-loc R``, where W is a store and R
>       is a load, then R must read from W or from some other store
>       which comes after W in the coherence order.
>  
> -     Read-write coherence: If R ->po-loc W, where R is a load and W
> +     Read-write coherence: If ``R ->po-loc W``, where R is a load and W
>       is a store, then the store which R reads from must come before
>       W in the coherence order.
>  
> -     Read-read coherence: If R ->po-loc R', where R and R' are two
> +     Read-read coherence: If ``R ->po-loc R'``, where R and R' are two
>       loads, then either they read from the same store or else the
>       store read by R comes before the store read by R' in the
>       coherence order.
> @@ -612,7 +613,7 @@ requirement that every store eventually becomes visible 
> to every CPU.)
>  Any reasonable memory model will include cache coherence.  Indeed, our
>  expectation of cache coherence is so deeply ingrained that violations
>  of its requirements look more like hardware bugs than programming
> -errors:
> +errors::
>  
>       int x;
>  
> @@ -628,6 +629,8 @@ write-write coherence rule: Since the store of 23 comes 
> later in
>  program order, it must also come later in x's coherence order and
>  thus must overwrite the store of 17.
>  
> +::
> +
>       int x = 0;
>  
>       P0()
> @@ -644,6 +647,8 @@ program order, so it must not read from that store but 
> rather from one
>  coming earlier in the coherence order (in this case, x's initial
>  value).
>  
> +::
> +
>       int x = 0;
>  
>       P0()
> @@ -689,12 +694,12 @@ THE FROM-READS RELATION: fr, fri, and fre
>  
>  The from-reads relation (fr) can be a little difficult for people to
>  grok.  It describes the situation where a load reads a value that gets
> -overwritten by a store.  In other words, we have R ->fr W when the
> +overwritten by a store.  In other words, we have ``R ->fr W`` when the
>  value that R reads is overwritten (directly or indirectly) by W, or
>  equivalently, when R reads from a store which comes earlier than W in
>  the coherence order.
>  
> -For example:
> +For example::
>  
>       int x = 0;
>  
> @@ -718,9 +723,11 @@ different CPUs).
>  
>  Note that the fr relation is determined entirely by the rf and co
>  relations; it is not independent.  Given a read event R and a write
> -event W for the same location, we will have R ->fr W if and only if
> +event W for the same location, we will have ``R ->fr W`` if and only if
>  the write which R reads from is co-before W.  In symbols,
>  
> +::
> +
>       (R ->fr W) := (there exists W' with W' ->rf R and W' ->co W).
>  
>  
> @@ -843,13 +850,13 @@ defined to link memory access events E and F whenever:
>       event occurs between them in program order; or
>  
>       F is a release fence and some X comes before F in program order,
> -     where either X = E or else E ->rf X; or
> +     where either ``X = E`` or else ``E ->rf X``; or
>  
>       A strong fence event occurs between some X and F in program
> -     order, where either X = E or else E ->rf X.
> +     order, where either ``X = E`` or else ``E ->rf X``.
>  
>  The operational model requires that whenever W and W' are both stores
> -and W ->cumul-fence W', then W must propagate to any given CPU
> +and ``W ->cumul-fence W'``, then W must propagate to any given CPU
>  before W' does.  However, for different CPUs C and C', it does not
>  require W to propagate to C before W' propagates to C'.
>  
> @@ -903,11 +910,11 @@ first for CPU 0, then CPU 1, etc.
>  
>  You can check that the four coherency rules imply that the rf, co, fr,
>  and po-loc relations agree with this global ordering; in other words,
> -whenever we have X ->rf Y or X ->co Y or X ->fr Y or X ->po-loc Y, the
> +whenever we have ``X ->rf Y`` or ``X ->co Y`` or ``X ->fr Y`` or ``X 
> ->po-loc Y``, the
>  X event comes before the Y event in the global ordering.  The LKMM's
>  "coherence" axiom expresses this by requiring the union of these
>  relations not to have any cycles.  This means it must not be possible
> -to find events
> +to find events::
>  
>       X0 -> X1 -> X2 -> ... -> Xn -> X0,
>  
> @@ -929,7 +936,7 @@ this case) does not get altered between the read and the 
> write events
>  making up the atomic operation.  In particular, if two CPUs perform
>  atomic_inc(&x) concurrently, it must be guaranteed that the final
>  value of x will be the initial value plus two.  We should never have
> -the following sequence of events:
> +the following sequence of events::
>  
>       CPU 0 loads x obtaining 13;
>                                       CPU 1 loads x obtaining 13;
> @@ -951,6 +958,8 @@ atomic read-modify-write and W' is the write event which 
> R reads from,
>  there must not be any stores coming between W' and W in the coherence
>  order.  Equivalently,
>  
> +::
> +
>       (R ->rmw W) implies (there is no X with R ->fr X and X ->co W),
>  
>  where the rmw relation links the read and write events making up each
> @@ -968,7 +977,7 @@ po.
>  
>  The operational model already includes a description of one such
>  situation: Fences are a source of ppo links.  Suppose X and Y are
> -memory accesses with X ->po Y; then the CPU must execute X before Y if
> +memory accesses with ``X ->po Y``; then the CPU must execute X before Y if
>  any of the following hold:
>  
>       A strong (smp_mb() or synchronize_rcu()) fence occurs between
> @@ -987,7 +996,7 @@ any of the following hold:
>  Another possibility, not mentioned earlier but discussed in the next
>  section, is:
>  
> -     X and Y are both loads, X ->addr Y (i.e., there is an address
> +     X and Y are both loads, ``X ->addr Y`` (i.e., there is an address
>       dependency from X to Y), and X is a READ_ONCE() or an atomic
>       access.
>  
> @@ -1021,7 +1030,7 @@ includes address dependencies to loads in the ppo 
> relation.
>  
>  On the other hand, dependencies can indirectly affect the ordering of
>  two loads.  This happens when there is a dependency from a load to a
> -store and a second, po-later load reads from that store:
> +store and a second, po-later load reads from that store::
>  
>       R ->dep W ->rfi R',
>  
> @@ -1036,7 +1045,7 @@ R; if the speculation turns out to be wrong then the 
> CPU merely has to
>  restart or abandon R'.
>  
>  (In theory, a CPU might forward a store to a load when it runs across
> -an address dependency like this:
> +an address dependency like this::
>  
>       r1 = READ_ONCE(ptr);
>       WRITE_ONCE(*r1, 17);
> @@ -1048,7 +1057,7 @@ However, none of the architectures supported by the 
> Linux kernel do
>  this.)
>  
>  Two memory accesses of the same location must always be executed in
> -program order if the second access is a store.  Thus, if we have
> +program order if the second access is a store.  Thus, if we have::
>  
>       R ->po-loc W
>  
> @@ -1056,7 +1065,7 @@ program order if the second access is a store.  Thus, 
> if we have
>  access the same location), the CPU is obliged to execute W after R.
>  If it executed W first then the memory subsystem would respond to R's
>  read request with the value stored by W (or an even later store), in
> -violation of the read-write coherence rule.  Similarly, if we had
> +violation of the read-write coherence rule.  Similarly, if we had::
>  
>       W ->po-loc W'
>  
> @@ -1074,7 +1083,7 @@ AND THEN THERE WAS ALPHA
>  
>  As mentioned above, the Alpha architecture is unique in that it does
>  not appear to respect address dependencies to loads.  This means that
> -code such as the following:
> +code such as the following::
>  
>       int x = 0;
>       int y = -1;
> @@ -1128,7 +1137,7 @@ nothing at all on non-Alpha builds) after every 
> READ_ONCE() and atomic
>  load.  The effect of the fence is to cause the CPU not to execute any
>  po-later instructions until after the local cache has finished
>  processing all the stores it has already received.  Thus, if the code
> -was changed to:
> +was changed to::
>  
>       P1()
>       {
> @@ -1167,25 +1176,25 @@ The happens-before relation (hb) links memory 
> accesses that have to
>  execute in a certain order.  hb includes the ppo relation and two
>  others, one of which is rfe.
>  
> -W ->rfe R implies that W and R are on different CPUs.  It also means
> +``W ->rfe R`` implies that W and R are on different CPUs.  It also means
>  that W's store must have propagated to R's CPU before R executed;
>  otherwise R could not have read the value stored by W.  Therefore W
> -must have executed before R, and so we have W ->hb R.
> +must have executed before R, and so we have ``W ->hb R``.
>  
> -The equivalent fact need not hold if W ->rfi R (i.e., W and R are on
> +The equivalent fact need not hold if ``W ->rfi R`` (i.e., W and R are on
>  the same CPU).  As we have already seen, the operational model allows
>  W's value to be forwarded to R in such cases, meaning that R may well
>  execute before W does.
>  
>  It's important to understand that neither coe nor fre is included in
>  hb, despite their similarities to rfe.  For example, suppose we have
> -W ->coe W'.  This means that W and W' are stores to the same location,
> +``W ->coe W'``.  This means that W and W' are stores to the same location,
>  they execute on different CPUs, and W comes before W' in the coherence
>  order (i.e., W' overwrites W).  Nevertheless, it is possible for W' to
>  execute before W, because the decision as to which store overwrites
>  the other is made later by the memory subsystem.  When the stores are
>  nearly simultaneous, either one can come out on top.  Similarly,
> -R ->fre W means that W overwrites the value which R reads, but it
> +``R ->fre W`` means that W overwrites the value which R reads, but it
>  doesn't mean that W has to execute after R.  All that's necessary is
>  for the memory subsystem not to propagate W to R's CPU until after R
>  has executed, which is possible if W executes shortly before R.
> @@ -1199,7 +1208,7 @@ the first event in the coherence order and propagates 
> to C before the
>  second event executes.
>  
>  This is best explained with some examples.  The simplest case looks
> -like this:
> +like this::
>  
>       int x;
>  
> @@ -1225,7 +1234,7 @@ event, because P1's store came after P0's store in x's 
> coherence
>  order, and P1's store propagated to P0 before P0's load executed.
>  
>  An equally simple case involves two loads of the same location that
> -read from different stores:
> +read from different stores::
>  
>       int x = 0;
>  
> @@ -1252,7 +1261,7 @@ P1's store propagated to P0 before P0's second load 
> executed.
>  
>  Less trivial examples of prop all involve fences.  Unlike the simple
>  examples above, they can require that some instructions are executed
> -out of program order.  This next one should look familiar:
> +out of program order.  This next one should look familiar::
>  
>       int buf = 0, flag = 0;
>  
> @@ -1303,7 +1312,7 @@ rfe link.  You can concoct more exotic examples, 
> containing more than
>  one fence, although this quickly leads to diminishing returns in terms
>  of complexity.  For instance, here's an example containing a coe link
>  followed by two fences and an rfe link, utilizing the fact that
> -release fences are A-cumulative:
> +release fences are A-cumulative::
>  
>       int x, y, z;
>  
> @@ -1363,7 +1372,7 @@ links.  Let's see how this definition works out.
>  
>  Consider first the case where E is a store (implying that the sequence
>  of links begins with coe).  Then there are events W, X, Y, and Z such
> -that:
> +that::
>  
>       E ->coe W ->cumul-fence* X ->rfe? Y ->strong-fence Z ->hb* F,
>  
> @@ -1384,13 +1393,13 @@ The existence of a pb link from E to F implies that E 
> must execute
>  before F.  To see why, suppose that F executed first.  Then W would
>  have propagated to E's CPU before E executed.  If E was a store, the
>  memory subsystem would then be forced to make E come after W in the
> -coherence order, contradicting the fact that E ->coe W.  If E was a
> +coherence order, contradicting the fact that ``E ->coe W``.  If E was a
>  load, the memory subsystem would then be forced to satisfy E's read
>  request with the value stored by W or an even later store,
> -contradicting the fact that E ->fre W.
> +contradicting the fact that ``E ->fre W``.
>  
>  A good example illustrating how pb works is the SB pattern with strong
> -fences:
> +fences::
>  
>       int x = 0, y = 0;
>  
> @@ -1449,18 +1458,18 @@ span a full grace period.  In more detail, the 
> Guarantee says:
>       For any critical section C and any grace period G, at least
>       one of the following statements must hold:
>  
> -(1)  C ends before G does, and in addition, every store that
> -     propagates to C's CPU before the end of C must propagate to
> -     every CPU before G ends.
> +     (1)     C ends before G does, and in addition, every store that
> +             propagates to C's CPU before the end of C must propagate to
> +             every CPU before G ends.
>  
> -(2)  G starts before C does, and in addition, every store that
> -     propagates to G's CPU before the start of G must propagate
> -     to every CPU before C starts.
> +     (2)     G starts before C does, and in addition, every store that
> +             propagates to G's CPU before the start of G must propagate
> +             to every CPU before C starts.
>  
>  In particular, it is not possible for a critical section to both start
>  before and end after a grace period.
>  
> -Here is a simple example of RCU in action:
> +Here is a simple example of RCU in action::
>  
>       int x, y;
>  
> @@ -1509,9 +1518,9 @@ entirely clear.  The LKMM formalizes this notion by 
> means of the
>  rcu-link relation.  rcu-link encompasses a very general notion of
>  "before": If E and F are RCU fence events (i.e., rcu_read_lock(),
>  rcu_read_unlock(), or synchronize_rcu()) then among other things,
> -E ->rcu-link F includes cases where E is po-before some memory-access
> +``E ->rcu-link F`` includes cases where E is po-before some memory-access
>  event X, F is po-after some memory-access event Y, and we have any of
> -X ->rfe Y, X ->co Y, or X ->fr Y.
> +``X ->rfe Y``, ``X ->co Y``, or ``X ->fr Y``.
>  
>  The formal definition of the rcu-link relation is more than a little
>  obscure, and we won't give it here.  It is closely related to the pb
> @@ -1523,50 +1532,50 @@ The LKMM also defines the rcu-gp and rcu-rscsi 
> relations.  They bring
>  grace periods and read-side critical sections into the picture, in the
>  following way:
>  
> -     E ->rcu-gp F means that E and F are in fact the same event,
> +     ``E ->rcu-gp F`` means that E and F are in fact the same event,
>       and that event is a synchronize_rcu() fence (i.e., a grace
>       period).
>  
> -     E ->rcu-rscsi F means that E and F are the rcu_read_unlock()
> +     ``E ->rcu-rscsi F`` means that E and F are the rcu_read_unlock()
>       and rcu_read_lock() fence events delimiting some read-side
>       critical section.  (The 'i' at the end of the name emphasizes
>       that this relation is "inverted": It links the end of the
>       critical section to the start.)
>  
>  If we think of the rcu-link relation as standing for an extended
> -"before", then X ->rcu-gp Y ->rcu-link Z roughly says that X is a
> +"before", then ``X ->rcu-gp Y ->rcu-link Z`` roughly says that X is a
>  grace period which ends before Z begins.  (In fact it covers more than
>  this, because it also includes cases where some store propagates to
>  Z's CPU before Z begins but doesn't propagate to some other CPU until
> -after X ends.)  Similarly, X ->rcu-rscsi Y ->rcu-link Z says that X is
> +after X ends.)  Similarly, ``X ->rcu-rscsi Y ->rcu-link Z`` says that X is
>  the end of a critical section which starts before Z begins.
>  
>  The LKMM goes on to define the rcu-fence relation as a sequence of
>  rcu-gp and rcu-rscsi links separated by rcu-link links, in which the
>  number of rcu-gp links is >= the number of rcu-rscsi links.  For
> -example:
> +example::
>  
>       X ->rcu-gp Y ->rcu-link Z ->rcu-rscsi T ->rcu-link U ->rcu-gp V
>  
> -would imply that X ->rcu-fence V, because this sequence contains two
> +would imply that ``X ->rcu-fence V``, because this sequence contains two
>  rcu-gp links and one rcu-rscsi link.  (It also implies that
> -X ->rcu-fence T and Z ->rcu-fence V.)  On the other hand:
> +``X ->rcu-fence T`` and ``Z ->rcu-fence V``.)  On the other hand::
>  
>       X ->rcu-rscsi Y ->rcu-link Z ->rcu-rscsi T ->rcu-link U ->rcu-gp V
>  
> -does not imply X ->rcu-fence V, because the sequence contains only
> +does not imply ``X ->rcu-fence V``, because the sequence contains only
>  one rcu-gp link but two rcu-rscsi links.
>  
>  The rcu-fence relation is important because the Grace Period Guarantee
>  means that rcu-fence acts kind of like a strong fence.  In particular,
> -E ->rcu-fence F implies not only that E begins before F ends, but also
> +``E ->rcu-fence F`` implies not only that E begins before F ends, but also
>  that any write po-before E will propagate to every CPU before any
>  instruction po-after F can execute.  (However, it does not imply that
>  E must execute before F; in fact, each synchronize_rcu() fence event
>  is linked to itself by rcu-fence as a degenerate case.)
>  
>  To prove this in full generality requires some intellectual effort.
> -We'll consider just a very simple case:
> +We'll consider just a very simple case::
>  
>       G ->rcu-gp W ->rcu-link Z ->rcu-rscsi F.
>  
> @@ -1595,13 +1604,13 @@ covered by rcu-fence.
>  Finally, the LKMM defines the RCU-before (rb) relation in terms of
>  rcu-fence.  This is done in essentially the same way as the pb
>  relation was defined in terms of strong-fence.  We will omit the
> -details; the end result is that E ->rb F implies E must execute
> -before F, just as E ->pb F does (and for much the same reasons).
> +details; the end result is that ``E ->rb F`` implies E must execute
> +before F, just as ``E ->pb F`` does (and for much the same reasons).
>  
>  Putting this all together, the LKMM expresses the Grace Period
>  Guarantee by requiring that the rb relation does not contain a cycle.
>  Equivalently, this "rcu" axiom requires that there are no events E
> -and F with E ->rcu-link F ->rcu-fence E.  Or to put it a third way,
> +and F with ``E ->rcu-link F ->rcu-fence E``.  Or to put it a third way,
>  the axiom requires that there are no cycles consisting of rcu-gp and
>  rcu-rscsi alternating with rcu-link, where the number of rcu-gp links
>  is >= the number of rcu-rscsi links.
> @@ -1621,7 +1630,7 @@ question, and let S be the synchronize_rcu() fence 
> event for the grace
>  period.  Saying that the critical section starts before S means there
>  are events Q and R where Q is po-after L (which marks the start of the
>  critical section), Q is "before" R in the sense used by the rcu-link
> -relation, and R is po-before the grace period S.  Thus we have:
> +relation, and R is po-before the grace period S.  Thus we have::
>  
>       L ->rcu-link S.
>  
> @@ -1629,20 +1638,20 @@ Let W be the store mentioned above, let Y come before 
> the end of the
>  critical section and witness that W propagates to the critical
>  section's CPU by reading from W, and let Z on some arbitrary CPU be a
>  witness that W has not propagated to that CPU, where Z happens after
> -some event X which is po-after S.  Symbolically, this amounts to:
> +some event X which is po-after S.  Symbolically, this amounts to::
>  
>       S ->po X ->hb* Z ->fr W ->rf Y ->po U.
>  
>  The fr link from Z to W indicates that W has not propagated to Z's CPU
>  at the time that Z executes.  From this, it can be shown (see the
>  discussion of the rcu-link relation earlier) that S and U are related
> -by rcu-link:
> +by rcu-link::
>  
>       S ->rcu-link U.
>  
> -Since S is a grace period we have S ->rcu-gp S, and since L and U are
> -the start and end of the critical section C we have U ->rcu-rscsi L.
> -From this we obtain:
> +Since S is a grace period we have ``S ->rcu-gp S``, and since L and U are
> +the start and end of the critical section C we have ``U ->rcu-rscsi L``.
> +From this we obtain::
>  
>       S ->rcu-gp S ->rcu-link U ->rcu-rscsi L ->rcu-link S,
>  
> @@ -1651,7 +1660,7 @@ the Grace Period Guarantee.
>  
>  For something a little more down-to-earth, let's see how the axiom
>  works out in practice.  Consider the RCU code example from above, this
> -time with statement labels added:
> +time with statement labels added::
>  
>       int x, y;
>  
> @@ -1674,20 +1683,20 @@ time with statement labels added:
>  
>  
>  If r2 = 0 at the end then P0's store at Y overwrites the value that
> -P1's load at W reads from, so we have W ->fre Y.  Since S ->po W and
> -also Y ->po U, we get S ->rcu-link U.  In addition, S ->rcu-gp S
> +P1's load at W reads from, so we have ``W ->fre Y``.  Since ``S ->po W`` and
> +also ``Y ->po U``, we get ``S ->rcu-link U``.  In addition, ``S ->rcu-gp S``
>  because S is a grace period.
>  
>  If r1 = 1 at the end then P1's load at Z reads from P0's store at X,
> -so we have X ->rfe Z.  Together with L ->po X and Z ->po S, this
> -yields L ->rcu-link S.  And since L and U are the start and end of a
> -critical section, we have U ->rcu-rscsi L.
> +so we have ``X ->rfe Z``.  Together with ``L ->po X`` and ``Z ->po S``, this
> +yields ``L ->rcu-link S``.  And since L and U are the start and end of a
> +critical section, we have ``U ->rcu-rscsi L``.
>  
> -Then U ->rcu-rscsi L ->rcu-link S ->rcu-gp S ->rcu-link U is a
> +Then ``U ->rcu-rscsi L ->rcu-link S ->rcu-gp S ->rcu-link U`` is a
>  forbidden cycle, violating the "rcu" axiom.  Hence the outcome is not
>  allowed by the LKMM, as we would expect.
>  
> -For contrast, let's see what can happen in a more complicated example:
> +For contrast, let's see what can happen in a more complicated example::
>  
>       int x, y, z;
>  
> @@ -1720,28 +1729,28 @@ For contrast, let's see what can happen in a more 
> complicated example:
>               U2: rcu_read_unlock();
>       }
>  
> -If r0 = r1 = r2 = 1 at the end, then similar reasoning to before shows
> -that U0 ->rcu-rscsi L0 ->rcu-link S1 ->rcu-gp S1 ->rcu-link U2 ->rcu-rscsi
> -L2 ->rcu-link U0.  However this cycle is not forbidden, because the
> +If ``r0 = r1 = r2 = 1`` at the end, then similar reasoning to before shows
> +that ``U0 ->rcu-rscsi L0 ->rcu-link S1 ->rcu-gp S1 ->rcu-link U2 ->rcu-rscsi
> +L2 ->rcu-link U0``.  However this cycle is not forbidden, because the
>  sequence of relations contains fewer instances of rcu-gp (one) than of
>  rcu-rscsi (two).  Consequently the outcome is allowed by the LKMM.
>  The following instruction timing diagram shows how it might actually
> -occur:
> -
> -P0                   P1                      P2
> --------------------- --------------------    --------------------
> -rcu_read_lock()
> -WRITE_ONCE(y, 1)
> -                     r1 = READ_ONCE(y)
> -                     synchronize_rcu() starts
> -                     .                       rcu_read_lock()
> -                     .                       WRITE_ONCE(x, 1)
> -r0 = READ_ONCE(x)    .
> -rcu_read_unlock()    .
> -                     synchronize_rcu() ends
> -                     WRITE_ONCE(z, 1)
> -                                             r2 = READ_ONCE(z)
> -                                             rcu_read_unlock()
> +occur::
> +
> +     P0                      P1                      P2
> +     --------------------    --------------------    --------------------
> +     rcu_read_lock()
> +     WRITE_ONCE(y, 1)
> +                             r1 = READ_ONCE(y)
> +                             synchronize_rcu() starts
> +                             .                       rcu_read_lock()
> +                             .                       WRITE_ONCE(x, 1)
> +     r0 = READ_ONCE(x)       .
> +     rcu_read_unlock()       .
> +                             synchronize_rcu() ends
> +                             WRITE_ONCE(z, 1)
> +                                                     r2 = READ_ONCE(z)
> +                                                     rcu_read_unlock()
>  
>  This requires P0 and P2 to execute their loads and stores out of
>  program order, but of course they are allowed to do so.  And as you
> @@ -1767,26 +1776,26 @@ The LKMM includes locking.  In fact, there is special 
> code for locking
>  in the formal model, added in order to make tools run faster.
>  However, this special code is intended to be more or less equivalent
>  to concepts we have already covered.  A spinlock_t variable is treated
> -the same as an int, and spin_lock(&s) is treated almost the same as:
> +the same as an int, and spin_lock(&s) is treated almost the same as::
>  
>       while (cmpxchg_acquire(&s, 0, 1) != 0)
>               cpu_relax();
>  
>  This waits until s is equal to 0 and then atomically sets it to 1,
>  and the read part of the cmpxchg operation acts as an acquire fence.
> -An alternate way to express the same thing would be:
> +An alternate way to express the same thing would be::
>  
>       r = xchg_acquire(&s, 1);
>  
>  along with a requirement that at the end, r = 0.  Similarly,
> -spin_trylock(&s) is treated almost the same as:
> +spin_trylock(&s) is treated almost the same as::
>  
>       return !cmpxchg_acquire(&s, 0, 1);
>  
>  which atomically sets s to 1 if it is currently equal to 0 and returns
>  true if it succeeds (the read part of the cmpxchg operation acts as an
>  acquire fence only if the operation is successful).  spin_unlock(&s)
> -is treated almost the same as:
> +is treated almost the same as::
>  
>       smp_store_release(&s, 0);
>  
> @@ -1802,7 +1811,7 @@ requires that every instruction po-before the 
> lock-release must
>  execute before any instruction po-after the lock-acquire.  This would
>  naturally hold if the release and acquire operations were on different
>  CPUs, but the LKMM says it holds even when they are on the same CPU.
> -For example:
> +For example::
>  
>       int x, y;
>       spinlock_t s;
> @@ -1833,7 +1842,7 @@ MP pattern).
>  
>  This requirement does not apply to ordinary release and acquire
>  fences, only to lock-related operations.  For instance, suppose P0()
> -in the example had been written as:
> +in the example had been written as::
>  
>       P0()
>       {
> @@ -1847,7 +1856,7 @@ in the example had been written as:
>  
>  Then the CPU would be allowed to forward the s = 1 value from the
>  smp_store_release() to the smp_load_acquire(), executing the
> -instructions in the following order:
> +instructions in the following order::
>  
>               r3 = smp_load_acquire(&s);      // Obtains r3 = 1
>               r2 = READ_ONCE(y);
> @@ -1859,7 +1868,7 @@ and thus it could load y before x, obtaining r2 = 0 and 
> r1 = 1.
>  Second, when a lock-acquire reads from a lock-release, and some other
>  stores W and W' occur po-before the lock-release and po-after the
>  lock-acquire respectively, the LKMM requires that W must propagate to
> -each CPU before W' does.  For example, consider:
> +each CPU before W' does.  For example, consider::
>  
>       int x, y;
>       spinlock_t x;
> @@ -1928,7 +1937,7 @@ smp_store_release().)  The final effect is the same.
>  Although we didn't mention it above, the instruction execution
>  ordering provided by the smp_rmb() fence doesn't apply to read events
>  that are part of a non-value-returning atomic update.  For instance,
> -given:
> +given::
>  
>       atomic_inc(&x);
>       smp_rmb();
> @@ -1967,14 +1976,14 @@ they behave as follows:
>       events.
>  
>  Interestingly, RCU and locking each introduce the possibility of
> -deadlock.  When faced with code sequences such as:
> +deadlock.  When faced with code sequences such as::
>  
>       spin_lock(&s);
>       spin_lock(&s);
>       spin_unlock(&s);
>       spin_unlock(&s);
>  
> -or:
> +or::
>  
>       rcu_read_lock();
>       synchronize_rcu();
> @@ -1984,7 +1993,7 @@ what does the LKMM have to say?  Answer: It says there 
> are no allowed
>  executions at all, which makes sense.  But this can also lead to
>  misleading results, because if a piece of code has multiple possible
>  executions, some of which deadlock, the model will report only on the
> -non-deadlocking executions.  For example:
> +non-deadlocking executions.  For example::
>  
>       int x, y;
>  
> diff --git a/tools/memory-model/Documentation/index.rst 
> b/tools/memory-model/Documentation/index.rst
> new file mode 100644
> index 000000000000..0e53d83a5a48
> --- /dev/null
> +++ b/tools/memory-model/Documentation/index.rst
> @@ -0,0 +1,20 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +========================
> +Linux Memory Model Tools
> +========================
> +
> +.. toctree::
> +   :maxdepth: 1
> +
> +   explanation
> +   recipes
> +   references
> +   cheatsheet
> +
> +.. only::  subproject and html
> +
> +   Indices
> +   =======
> +
> +   * :ref:`genindex`
> diff --git a/tools/memory-model/Documentation/recipes.txt 
> b/tools/memory-model/Documentation/recipes.rst
> similarity index 96%
> rename from tools/memory-model/Documentation/recipes.txt
> rename to tools/memory-model/Documentation/recipes.rst
> index 7fe8d7aa3029..0229a431b1a2 100644
> --- a/tools/memory-model/Documentation/recipes.txt
> +++ b/tools/memory-model/Documentation/recipes.rst
> @@ -1,3 +1,8 @@
> +=======
> +Recipes
> +=======
> +
> +
>  This document provides "recipes", that is, litmus tests for commonly
>  occurring situations, as well as a few that illustrate subtly broken but
>  attractive nuisances.  Many of these recipes include example code from
> @@ -67,7 +72,7 @@ has acquired a given lock sees any changes previously seen 
> or made by any
>  CPU before it released that same lock.  Note that this statement is a bit
>  stronger than "Any CPU holding a given lock sees all changes made by any
>  CPU during the time that CPU was holding this same lock".  For example,
> -consider the following pair of code fragments:
> +consider the following pair of code fragments::
>  
>       /* See MP+polocks.litmus. */
>       void CPU0(void)
> @@ -93,7 +98,7 @@ value of r1 must also be equal to 1.  In contrast, the 
> weaker rule would
>  say nothing about the final value of r1.
>  
>  The converse to the basic rule also holds, as illustrated by the
> -following litmus test:
> +following litmus test::
>  
>       /* See MP+porevlocks.litmus. */
>       void CPU0(void)
> @@ -124,7 +129,7 @@ across multiple CPUs.
>  
>  However, it is not necessarily the case that accesses ordered by
>  locking will be seen as ordered by CPUs not holding that lock.
> -Consider this example:
> +Consider this example::
>  
>       /* See Z6.0+pooncerelease+poacquirerelease+fencembonceonce.litmus. */
>       void CPU0(void)
> @@ -157,7 +162,7 @@ CPU2() never acquired the lock, and thus did not benefit 
> from the
>  lock's ordering properties.
>  
>  Ordering can be extended to CPUs not holding the lock by careful use
> -of smp_mb__after_spinlock():
> +of smp_mb__after_spinlock()::
>  
>       /* See Z6.0+pooncelock+poonceLock+pombonce.litmus. */
>       void CPU0(void)
> @@ -214,7 +219,7 @@ Release and acquire
>  ~~~~~~~~~~~~~~~~~~~
>  
>  Use of smp_store_release() and smp_load_acquire() is one way to force
> -the desired MP ordering.  The general approach is shown below:
> +the desired MP ordering.  The general approach is shown below::
>  
>       /* See MP+pooncerelease+poacquireonce.litmus. */
>       void CPU0(void)
> @@ -245,7 +250,7 @@ Assign and dereference
>  Use of rcu_assign_pointer() and rcu_dereference() is quite similar to the
>  use of smp_store_release() and smp_load_acquire(), except that both
>  rcu_assign_pointer() and rcu_dereference() operate on RCU-protected
> -pointers.  The general approach is shown below:
> +pointers.  The general approach is shown below::
>  
>       /* See MP+onceassign+derefonce.litmus. */
>       int z;
> @@ -290,7 +295,7 @@ Write and read memory barriers
>  It is usually better to use smp_store_release() instead of smp_wmb()
>  and to use smp_load_acquire() instead of smp_rmb().  However, the older
>  smp_wmb() and smp_rmb() APIs are still heavily used, so it is important
> -to understand their use cases.  The general approach is shown below:
> +to understand their use cases.  The general approach is shown below::
>  
>       /* See MP+fencewmbonceonce+fencermbonceonce.litmus. */
>       void CPU0(void)
> @@ -312,7 +317,7 @@ smp_rmb() macro orders prior loads against later loads.  
> Therefore, if
>  the final value of r0 is 1, the final value of r1 must also be 1.
>  
>  The xlog_state_switch_iclogs() function in fs/xfs/xfs_log.c contains
> -the following write-side code fragment:
> +the following write-side code fragment::
>  
>       log->l_curr_block -= log->l_logBBsize;
>       ASSERT(log->l_curr_block >= 0);
> @@ -327,7 +332,7 @@ the corresponding read-side code fragment:
>       cur_block = READ_ONCE(log->l_curr_block);
>  
>  Alternatively, consider the following comment in function
> -perf_output_put_handle() in kernel/events/ring_buffer.c:
> +perf_output_put_handle() in kernel/events/ring_buffer.c::
>  
>        *   kernel                             user
>        *
> @@ -358,7 +363,7 @@ absence of any ordering it is quite possible that this 
> may happen, as
>  can be seen in the LB+poonceonces.litmus litmus test.
>  
>  One way of avoiding the counter-intuitive outcome is through the use of a
> -control dependency paired with a full memory barrier:
> +control dependency paired with a full memory barrier::
>  
>       /* See LB+fencembonceonce+ctrlonceonce.litmus. */
>       void CPU0(void)
> @@ -382,7 +387,7 @@ The A/D pairing from the ring-buffer use case shown 
> earlier also
>  illustrates LB.  Here is a repeat of the comment in
>  perf_output_put_handle() in kernel/events/ring_buffer.c, showing a
>  control dependency on the kernel side and a full memory barrier on
> -the user side:
> +the user side::
>  
>        *   kernel                             user
>        *
> @@ -407,7 +412,7 @@ Release-acquire chains
>  
>  Release-acquire chains are a low-overhead, flexible, and easy-to-use
>  method of maintaining order.  However, they do have some limitations that
> -need to be fully understood.  Here is an example that maintains order:
> +need to be fully understood.  Here is an example that maintains order::
>  
>       /* See ISA2+pooncerelease+poacquirerelease+poacquireonce.litmus. */
>       void CPU0(void)
> @@ -436,7 +441,7 @@ example, ordering would still be preserved if CPU1()'s 
> smp_load_acquire()
>  invocation was replaced with READ_ONCE().
>  
>  It is tempting to assume that CPU0()'s store to x is globally ordered
> -before CPU1()'s store to z, but this is not the case:
> +before CPU1()'s store to z, but this is not the case::
>  
>       /* See Z6.0+pooncerelease+poacquirerelease+mbonceonce.litmus. */
>       void CPU0(void)
> @@ -474,7 +479,7 @@ Store buffering
>  Store buffering can be thought of as upside-down load buffering, so
>  that one CPU first stores to one variable and then loads from a second,
>  while another CPU stores to the second variable and then loads from the
> -first.  Preserving order requires nothing less than full barriers:
> +first.  Preserving order requires nothing less than full barriers::
>  
>       /* See SB+fencembonceonces.litmus. */
>       void CPU0(void)
> @@ -498,7 +503,7 @@ this counter-intuitive outcome.
>  This pattern most famously appears as part of Dekker's locking
>  algorithm, but it has a much more practical use within the Linux kernel
>  of ordering wakeups.  The following comment taken from waitqueue_active()
> -in include/linux/wait.h shows the canonical pattern:
> +in include/linux/wait.h shows the canonical pattern::
>  
>   *      CPU0 - waker                    CPU1 - waiter
>   *
> @@ -550,16 +555,16 @@ The strength of memory ordering required for a given 
> litmus test to
>  avoid a counter-intuitive outcome depends on the types of relations
>  linking the memory accesses for the outcome in question:
>  
> -o    If all links are write-to-read links, then the weakest
> +-    If all links are write-to-read links, then the weakest
>       possible ordering within each CPU suffices.  For example, in
>       the LB litmus test, a control dependency was enough to do the
>       job.
>  
> -o    If all but one of the links are write-to-read links, then a
> +-    If all but one of the links are write-to-read links, then a
>       release-acquire chain suffices.  Both the MP and the ISA2
>       litmus tests illustrate this case.
>  
> -o    If more than one of the links are something other than
> +-    If more than one of the links are something other than
>       write-to-read links, then a full memory barrier is required
>       between each successive pair of non-write-to-read links.  This
>       case is illustrated by the Z6.0 litmus tests, both in the
> diff --git a/tools/memory-model/Documentation/references.txt 
> b/tools/memory-model/Documentation/references.rst
> similarity index 71%
> rename from tools/memory-model/Documentation/references.txt
> rename to tools/memory-model/Documentation/references.rst
> index b177f3e4a614..275876cd10b8 100644
> --- a/tools/memory-model/Documentation/references.txt
> +++ b/tools/memory-model/Documentation/references.rst
> @@ -1,3 +1,7 @@
> +==========
> +References
> +==========
> +
>  This document provides background reading for memory models and related
>  tools.  These documents are aimed at kernel hackers who are interested
>  in memory models.
> @@ -6,64 +10,64 @@ in memory models.
>  Hardware manuals and models
>  ===========================
>  
> -o    SPARC International Inc. (Ed.). 1994. "The SPARC Architecture
> +-    SPARC International Inc. (Ed.). 1994. "The SPARC Architecture
>       Reference Manual Version 9". SPARC International Inc.
>  
> -o    Compaq Computer Corporation (Ed.). 2002. "Alpha Architecture
> +-    Compaq Computer Corporation (Ed.). 2002. "Alpha Architecture
>       Reference Manual".  Compaq Computer Corporation.
>  
> -o    Intel Corporation (Ed.). 2002. "A Formal Specification of Intel
> +-    Intel Corporation (Ed.). 2002. "A Formal Specification of Intel
>       Itanium Processor Family Memory Ordering". Intel Corporation.
>  
> -o    Intel Corporation (Ed.). 2002. "Intel 64 and IA-32 Architectures
> +-    Intel Corporation (Ed.). 2002. "Intel 64 and IA-32 Architectures
>       Software Developer’s Manual". Intel Corporation.
>  
> -o    Peter Sewell, Susmit Sarkar, Scott Owens, Francesco Zappa Nardelli,
> +-    Peter Sewell, Susmit Sarkar, Scott Owens, Francesco Zappa Nardelli,
>       and Magnus O. Myreen. 2010. "x86-TSO: A Rigorous and Usable
>       Programmer's Model for x86 Multiprocessors". Commun. ACM 53, 7
>       (July, 2010), 89-97. http://doi.acm.org/10.1145/1785414.1785443
>  
> -o    IBM Corporation (Ed.). 2009. "Power ISA Version 2.06". IBM
> +-    IBM Corporation (Ed.). 2009. "Power ISA Version 2.06". IBM
>       Corporation.
>  
> -o    ARM Ltd. (Ed.). 2009. "ARM Barrier Litmus Tests and Cookbook".
> +-    ARM Ltd. (Ed.). 2009. "ARM Barrier Litmus Tests and Cookbook".
>       ARM Ltd.
>  
> -o    Susmit Sarkar, Peter Sewell, Jade Alglave, Luc Maranget, and
> +-    Susmit Sarkar, Peter Sewell, Jade Alglave, Luc Maranget, and
>       Derek Williams.  2011. "Understanding POWER Multiprocessors". In
>       Proceedings of the 32Nd ACM SIGPLAN Conference on Programming
>       Language Design and Implementation (PLDI ’11). ACM, New York,
>       NY, USA, 175–186.
>  
> -o    Susmit Sarkar, Kayvan Memarian, Scott Owens, Mark Batty,
> +-    Susmit Sarkar, Kayvan Memarian, Scott Owens, Mark Batty,
>       Peter Sewell, Luc Maranget, Jade Alglave, and Derek Williams.
>       2012. "Synchronising C/C++ and POWER". In Proceedings of the 33rd
>       ACM SIGPLAN Conference on Programming Language Design and
>       Implementation (PLDI '12). ACM, New York, NY, USA, 311-322.
>  
> -o    ARM Ltd. (Ed.). 2014. "ARM Architecture Reference Manual (ARMv8,
> +-    ARM Ltd. (Ed.). 2014. "ARM Architecture Reference Manual (ARMv8,
>       for ARMv8-A architecture profile)". ARM Ltd.
>  
> -o    Imagination Technologies, LTD. 2015. "MIPS(R) Architecture
> +-    Imagination Technologies, LTD. 2015. "MIPS(R) Architecture
>       For Programmers, Volume II-A: The MIPS64(R) Instruction,
>       Set Reference Manual". Imagination Technologies,
>       LTD. https://imgtec.com/?do-download=4302.
>  
> -o    Shaked Flur, Kathryn E. Gray, Christopher Pulte, Susmit
> +-    Shaked Flur, Kathryn E. Gray, Christopher Pulte, Susmit
>       Sarkar, Ali Sezgin, Luc Maranget, Will Deacon, and Peter
>       Sewell. 2016. "Modelling the ARMv8 Architecture, Operationally:
>       Concurrency and ISA". In Proceedings of the 43rd Annual ACM
>       SIGPLAN-SIGACT Symposium on Principles of Programming Languages
>       (POPL ’16). ACM, New York, NY, USA, 608–621.
>  
> -o    Shaked Flur, Susmit Sarkar, Christopher Pulte, Kyndylan Nienhuis,
> +-    Shaked Flur, Susmit Sarkar, Christopher Pulte, Kyndylan Nienhuis,
>       Luc Maranget, Kathryn E. Gray, Ali Sezgin, Mark Batty, and Peter
>       Sewell. 2017. "Mixed-size Concurrency: ARM, POWER, C/C++11,
>       and SC". In Proceedings of the 44th ACM SIGPLAN Symposium on
>       Principles of Programming Languages (POPL 2017). ACM, New York,
>       NY, USA, 429–442.
>  
> -o    Christopher Pulte, Shaked Flur, Will Deacon, Jon French,
> +-    Christopher Pulte, Shaked Flur, Will Deacon, Jon French,
>       Susmit Sarkar, and Peter Sewell. 2018. "Simplifying ARM concurrency:
>       multicopy-atomic axiomatic and operational models for ARMv8". In
>       Proceedings of the ACM on Programming Languages, Volume 2, Issue
> @@ -73,18 +77,18 @@ o Christopher Pulte, Shaked Flur, Will Deacon, Jon French,
>  Linux-kernel memory model
>  =========================
>  
> -o    Jade Alglave, Luc Maranget, Paul E. McKenney, Andrea Parri, and
> +-    Jade Alglave, Luc Maranget, Paul E. McKenney, Andrea Parri, and
>       Alan Stern.  2018. "Frightening small children and disconcerting
>       grown-ups: Concurrency in the Linux kernel". In Proceedings of
>       the 23rd International Conference on Architectural Support for
>       Programming Languages and Operating Systems (ASPLOS 2018). ACM,
>       New York, NY, USA, 405-418.  Webpage: http://diy.inria.fr/linux/.
>  
> -o    Jade Alglave, Luc Maranget, Paul E. McKenney, Andrea Parri, and
> +-    Jade Alglave, Luc Maranget, Paul E. McKenney, Andrea Parri, and
>       Alan Stern.  2017.  "A formal kernel memory-ordering model (part 1)"
>       Linux Weekly News.  https://lwn.net/Articles/718628/
>  
> -o    Jade Alglave, Luc Maranget, Paul E. McKenney, Andrea Parri, and
> +-    Jade Alglave, Luc Maranget, Paul E. McKenney, Andrea Parri, and
>       Alan Stern.  2017.  "A formal kernel memory-ordering model (part 2)"
>       Linux Weekly News.  https://lwn.net/Articles/720550/
>  
> @@ -92,16 +96,16 @@ o Jade Alglave, Luc Maranget, Paul E. McKenney, Andrea 
> Parri, and
>  Memory-model tooling
>  ====================
>  
> -o    Daniel Jackson. 2002. "Alloy: A Lightweight Object Modelling
> +-    Daniel Jackson. 2002. "Alloy: A Lightweight Object Modelling
>       Notation". ACM Trans. Softw. Eng. Methodol. 11, 2 (April 2002),
>       256–290. http://doi.acm.org/10.1145/505145.505149
>  
> -o    Jade Alglave, Luc Maranget, and Michael Tautschnig. 2014. "Herding
> +-    Jade Alglave, Luc Maranget, and Michael Tautschnig. 2014. "Herding
>       Cats: Modelling, Simulation, Testing, and Data Mining for Weak
>       Memory". ACM Trans. Program. Lang. Syst. 36, 2, Article 7 (July
>       2014), 7:1–7:74 pages.
>  
> -o    Jade Alglave, Patrick Cousot, and Luc Maranget. 2016. "Syntax and
> +-    Jade Alglave, Patrick Cousot, and Luc Maranget. 2016. "Syntax and
>       semantics of the weak consistency model specification language
>       cat". CoRR abs/1608.07531 (2016). http://arxiv.org/abs/1608.07531
>  
> @@ -109,6 +113,6 @@ o Jade Alglave, Patrick Cousot, and Luc Maranget. 2016. 
> "Syntax and
>  Memory-model comparisons
>  ========================
>  
> -o    Paul E. McKenney, Ulrich Weigand, Andrea Parri, and Boqun
> +-    Paul E. McKenney, Ulrich Weigand, Andrea Parri, and Boqun
>       Feng. 2016. "Linux-Kernel Memory Model". (6 June 2016).
>       http://open-std.org/JTC1/SC22/WG21/docs/papers/2016/p0124r2.html.
> diff --git a/tools/memory-model/README b/tools/memory-model/README
> index 2b87f3971548..04bb1fa9ed76 100644
> --- a/tools/memory-model/README
> +++ b/tools/memory-model/README
> @@ -105,16 +105,16 @@ for more information.
>  DESCRIPTION OF FILES
>  ====================
>  
> -Documentation/cheatsheet.txt
> +Documentation/cheatsheet.rst
>       Quick-reference guide to the Linux-kernel memory model.
>  
> -Documentation/explanation.txt
> +Documentation/explanation.rst
>       Describes the memory model in detail.
>  
> -Documentation/recipes.txt
> +Documentation/recipes.rst
>       Lists common memory-ordering patterns.
>  
> -Documentation/references.txt
> +Documentation/references.rst
>       Provides background reading.
>  
>  linux-kernel.bell
> @@ -173,7 +173,7 @@ The Linux-kernel memory model has the following 
> limitations:
>       of READ_ONCE() and WRITE_ONCE() limits the compiler's ability
>       to optimize, but there is Linux-kernel code that uses bare C
>       memory accesses.  Handling this code is on the to-do list.
> -     For more information, see Documentation/explanation.txt (in
> +     For more information, see Documentation/explanation.rst (in
>       particular, the "THE PROGRAM ORDER RELATION: po AND po-loc"
>       and "A WARNING" sections).
>  
> 
> 

Reply via email to