[clang] [clang] Define convergence in C++ languages such as HIP, CUDA, OpenCL (PR #136280)

Sameer Sahasrabuddhe via cfe-commits Fri, 18 Apr 2025 01:27:13 -0700

https://github.com/ssahasra created 
https://github.com/llvm/llvm-project/pull/136280


The proposed definition closely follows the existing definition of convergence 
in LLVM IR, but using C++ terms to describe language constructs.

There is no undefined behaviour. For each situation, convergence is either 
fully specified or implementation-defined.

Two important limitations where LLVM IR requires convergence control tokens to 
correctly express the convergence specified here:

1. Some combinations of loops, continue and break statements have different 
convergence specified for the statements inside that region of code, but result 
in the same loops in LLVM IR, thus producing ambiguous convergence in LLVM IR.

2. When a divergent condition inside a loop contains a convergent call followed 
by a break statement, these statements are lexically inside the loop, but in 
LLVM IR, they are outside the corresponding CFG loop.

>From 4ddf344b77cc01282571c643d621af34e6a7d8ad Mon Sep 17 00:00:00 2001
From: Sameer Sahasrabuddhe <sameer.sahasrabud...@amd.com>
Date: Fri, 18 Apr 2025 12:21:36 +0530
Subject: [PATCH] [clang] Define convergence in C++ languages such as HIP,
 CUDA, OpenCL

The proposed definition closely follows the existing definition of convergence
in LLVM IR, but using C++ terms to describe language constructs.

There is no undefined behaviour. For each situation, convergence is either fully
specified or implementation-defined.

Two important limitations where LLVM IR requires convergence control tokens to
correctly express the convergence specified here:

1. Some combinations of loops, continue and break statements have different
   convergence specified for the statements inside that region of code, but
   result in the same loops in LLVM IR, thus producing ambiguous convergence in
   LLVM IR.

2. When a divergent condition inside a loop contains a convergent call followed
   by a break statement, these statements are lexically inside the loop, but in
   LLVM IR, they are outside the corresponding CFG loop.
---
 clang/docs/ThreadConvergence.rst              | 795 ++++++++++++++++++
 clang/docs/conf.py                            |   4 +
 clang/docs/index.rst                          |   1 +
 clang/include/clang/AST/ParentMap.h           |  14 +-
 .../Analysis/Analyses/ConvergenceCheck.h      |  25 +
 clang/include/clang/Analysis/CFG.h            |   2 +
 clang/include/clang/Basic/AttrDocs.td         |  16 +-
 clang/include/clang/Basic/DiagnosticGroups.td |   3 +
 .../clang/Basic/DiagnosticSemaKinds.td        |  10 +
 clang/lib/AST/ParentMap.cpp                   |  65 +-
 clang/lib/Analysis/AnalysisDeclContext.cpp    |   2 +-
 clang/lib/Analysis/CMakeLists.txt             |   1 +
 clang/lib/Analysis/ConvergenceCheck.cpp       | 119 +++
 clang/lib/Sema/AnalysisBasedWarnings.cpp      |   7 +-
 clang/test/SemaHIP/convergence-warnings.hip   | 473 +++++++++++
 15 files changed, 1494 insertions(+), 43 deletions(-)
 create mode 100644 clang/docs/ThreadConvergence.rst
 create mode 100644 clang/include/clang/Analysis/Analyses/ConvergenceCheck.h
 create mode 100644 clang/lib/Analysis/ConvergenceCheck.cpp
 create mode 100644 clang/test/SemaHIP/convergence-warnings.hip

diff --git a/clang/docs/ThreadConvergence.rst b/clang/docs/ThreadConvergence.rst
new file mode 100644
index 0000000000000..d872ab9cb77f5
--- /dev/null
+++ b/clang/docs/ThreadConvergence.rst
@@ -0,0 +1,795 @@
+==================
+Thread Convergence
+==================
+
+.. contents::
+   :local:
+
+Revisions
+=========
+
+- 2025/04/14 --- Created
+
+Introduction
+============
+
+Some languages such as OpenCL, CUDA and HIP execute threads in groups 
(typically
+on a GPU) that allow efficient communication within the group using special
+*crosslane* primitives. The outcome of a crosslane communication
+is sensitive to the set of threads that execute it "together", i.e.,
+`convergently`__. When control flow *diverges*, i.e., threads of the same group
+follow different paths through the program, not all threads of the group may be
+available to participate in this communication.
+
+__ https://llvm.org/docs/ConvergenceAndUniformity.html
+
+Crosslane Operations
+--------------------
+
+A *crosslane operation* is an expression whose evaluation by multiple threads
+produces a side-effect visible to all those threads in a manner that does not
+depend on volatile objects, library I/O functions or memory. The set of threads
+which participate in this communication is implicitly affected by control flow.
+
+For example, in the following GPU compute kernel, communication during the
+crosslane operation is expected to occur precisely among an environment-defined
+set of threads (such as workgroup or subgroup) for which ``condition`` is true:
+
+.. code-block:: c++
+   :caption: A crosslane operation
+   :name: convergence-example-crosslane-operation
+
+   void example_kernel() {
+      ...
+      if (condition)
+          crosslane_operation();
+      ...
+   }
+
+Thread Convergence
+------------------
+
+Whether two threads convergently execute an operation is different at every
+execution of that operation by those two threads. [Note: This corresponds to
+`dynamic instances in LLVM IR`__.] In a structured program, there is often an
+intuitive and unambiguous way of determining the threads that are converged at 
a
+particular operation. Threads may *diverge* at a *divergent branch*, and then
+*reconverge* at some later point in the program such as the end of an enclosing
+statement. However, this intuition does not work very well with unstructured
+control flow. In particular, when two threads enter an `irreducible cycle`__ in
+the control-flow graph along different paths, whether they converge inside the
+cycle and at which point depends on the choices made by the implementation.
+
+__ 
https://llvm.org/docs/ConvergenceAndUniformity.html#threads-and-dynamic-instances
+__ https://llvm.org/docs/CycleTerminology.html
+
+The intuitive picture of *convergence* is built around threads executing in
+"lock step" --- a set of threads is thought of as *converged* if they are all
+executing "the same sequence of instructions together". But this assumption is
+not necessary for describing communication at crosslane operations, and the
+convergence defined here *does not* assume that converged threads execute in
+lock-step.
+
+This document defines convergence at every evaluation in the program based on
+the state of the control-flow reaching that point in the source, including the
+iterations being performed by any enclosing loop statements. Convergence merely
+relates threads that must participate when a crosslane operation is executed.
+Such threads are not required to execute a crosslane operation "at the same
+time" or even on the same hardware resources. They may appear to do so in
+practice, but that is an implementation detail.
+
+.. _convergent-operation:
+
+Convergent Operations
+=====================
+
+A *convergent* operation is an expression marked with the attribute
+``convergent``. A *non-convergent* operation is an expression that is not 
marked
+as ``convergent``, and optionally marked with the attribute ``noconvergent``.
+
+In general, an implementation may not modify the set of converged threads
+associated with each evaluation of a convergent operation. But such
+optimizations are possible where the semantics of the specific convergent
+operation allows it. The specification for convergence control tokens in LLVM 
IR
+provides some `examples of correct transforms`__ in the presence of convergent
+operations.
+
+__ 
https://llvm.org/docs/ConvergentOperations.html#examples-for-the-correctness-of-program-transforms
+
+.. _convergence-thread-masks:
+
+Explicit Thread Masks
+---------------------
+
+Some languages like CUDA and HIP provide convergent operations that take an
+explicit threadmask as an argument. Threads are organized in groups called 
warps
+or waves, and a threadmask passed to a convergent operation specifies the
+threads within a warp that must participate in that convergent operation. The
+set of threads is explicitly specified by the programmer, rather than being
+implied by the control-flow of the program.
+
+The convergence defined in this document is not sufficient for describing the
+semantics of explicit threadmasks. The optimization constraints placed by these
+operations on the implementation are different from those placed by convergent
+operations with implicit threadmasks. At the same time, the convergence
+specified here is also not contradictory to that semantics --- it can still be
+used to determine the sets of threads that are potentially converged at each
+execution of such an operation.
+
+.. code-block:: C++
+   :caption: Explicit thread masks
+   :name: convergence-example-thread-masks
+
+   void crosslane_operation (unsigned long mask) __attribute__(("convergent"));
+
+   void bar(unsigned long mask) {
+     convergent_func(mask);
+   }
+
+   void foo() {
+     ...
+     auto mask = ...;
+
+     if (cond)
+       bar(mask); // B
+     else
+       bar(mask); // C
+   }
+
+The interpretation of the mask depends on the implementation:
+
+- On implementations where threads in a warp are assumed to execute in 
lock-step
+  (such as AMDGPU or for PTX specifying a target lower than sm_70), the mask
+  argument partitions this set of potentially converged threads into subsets of
+  threads that must be converged. In 
:numref:`convergence-example-thread-masks`,
+  threads that reach ``B`` (respectively ``C``) and have the same mask are
+  converged with each other when they eventually execute the call to
+  ``convergent_func``.
+- On implementations that allow full concurrency between threads (such as PTX
+  specifying sm_70 or higher targets), the mask argument partitions this set of
+  potentially converged threads into subsets of threads that converge with each
+  other, as well as with subsets executing other instances of the same
+  operation. In :numref:`convergence-example-thread-masks`, threads that 
execute
+  ``convergent_func`` as a result of reaching any one of ``B`` or ``C``
+  converged if they have the same mask, irrespective of whether they reached
+  ``convergent_func`` via ``B`` or ``C``.
+
+Cycles
+======
+
+Convergence is affected by `cycles in the control-flow graph`__ of the program.
+These may originate from iteration statements or from ``goto`` statements that
+transfer control to a label that occurs earlier in the program source. In
+particular, specifying convergence for irreducible cycles is cumbersome and
+likely to place unnecessary constraints on the implementation. Hence the
+convergence of threads in patterns that can potentially produce irreducible
+cycles is left to the implementation.
+
+__ https://llvm.org/docs/CycleTerminology.html
+
+  The *span* of a ``goto`` statement is the inclusive sequence of statements 
that
+  occur between a ``goto`` and its target label.
+
+  A *backwards jump* is a ``goto`` statement that transfers control to a label
+  that occurs before the ``goto`` in program source.
+
+  A *goto cycle* is the span of a backwards jump.
+
+  A *cycle* is either a *goto cycle* or an iteration statement.
+
+.. note::
+
+   To define a "backwards" jump, statements are ordered according to their
+   appearance in the sequence of tokens in a preprocessed source file. This
+   definition of a cycle is only a convenient approximation of a cycle in the
+   control-flow graph as defined by LLVM IR. Some instances of cycles defined
+   here may result in a different cycle in the corresponding control flow 
graph,
+   or maybe even no cycle at all.
+
+   For example:
+
+   - A backwards ``goto`` statement ``G`` that jumps out of an iteration
+     statement ``L`` may result in a control-flow cycle that includes ``L`` as 
a
+     child cycle.
+   - A ``goto`` in the ``else`` substatement of an ``if`` that jumps
+     to the "then" part of the ``if``.
+   - A ``goto`` in a ``switch`` statement that jumps backwards to a ``case``
+     that is not a fall-through.
+   - A ``goto`` that jumps backwards, but other subsequent jumps ensure that 
the
+     same ``goto`` is not encountered again except as a result of some outer
+     loop statement.
+
+   But such situations are rare and do not provide enough justification to
+   create a more detailed definition of cycles in the source code.
+
+Convergence
+===========
+
+*Converged-with* is a transitive symmetric relation over the evaluations of the
+same expression performed by different threads. In general, when two 
evaluations
+of the same expression performed by different threads are converged, they may
+communicate through any crosslane communication produced by that evaluation.
+
+*Convergence-before* is a strict partial order over evaluations
+performed by multiple threads. It is the transitive closure of:
+
+1. If evaluation ``P`` is sequenced-before ``Q``, then ``P`` is
+   *convergence-before* ``Q``.
+2. If evaluation ``P`` is sequenced-before ``Q1`` and ``Q1`` is 
*converged-with*
+   ``Q2``, then ``P`` is *convergence-before* ``Q2``.
+3. If evaluation ``P1`` is *converged-with* ``P2`` and ``P2`` is
+   sequenced-before ``Q``, then ``P1`` is *convergence-before* ``Q``.
+
+*Thread-converged-with* is a transitive symmetric relation over threads. For an
+expression ``E``, let ``S`` be the smallest statement that contains ``E``.
+
+- When two threads are converged at the execution of ``S``, they are also
+  converged at the evaluation of ``E`` if they both evaluate ``E``.
+- When two threads are converged at the evaluation of ``E``, those two
+  evaluations of ``E`` are also converged.
+
+Two evaluations are converged only if specified below as converged or
+implementation-defined.
+
+Mere convergence does not imply any memory synchronization or control-flow
+barriers.
+
+Function body
+-------------
+
+Whether two threads are converged at the start of a function body is determined
+at each invocation of that function.
+
+- When a function is invoked from outside the scope of the current program,
+  whether two threads are converged at this invocation is environment-defined.
+  For example:
+
+  - In an OpenCL kernel launch, the maximal set of threads that can communicate
+    outside the memory model is a workgroup. Hence, a suitable choice is to
+    specify that all the threads from a single OpenCL workgroup are pair-wise
+    converged at that launch of the kernel.
+  - In a C/C++ program, threads are launched independently and they can
+    communicate only through the memory model. Thus, a thread that enters a
+    C/C++ program (usually via the ``main`` function) is not converged with any
+    other thread.
+
+- When two threads are converged at a *convergent* function call in the 
program,
+  those two threads are converged at the start of the called function.
+
+Two threads that are converged at the beginning of a function are also 
converged
+when they exit the function by executing the same or different occurrences of
+the ``return`` statement in that function.
+
+.. _convergence-sequential-execution:
+
+Sequential Execution
+--------------------
+
+In C++, statements are executed in sequence unless control is transferred
+explicitly. Convergence follows this sequential execution.
+
+When two threads are converged at the execution of a statement ``S``, they are
+also converged at any substatement ``S1`` of ``S``, if every cycle that 
contains
+``S1`` also contains ``S`` and if they both reach ``S1`` during that execution
+of ``S``.
+
+.. code-block:: C++
+   :caption: Sequential execution at a branch
+   :name: convergence-example-sequential-branch
+
+   void foo() {
+     ... // A1
+     ... // A2
+     if (cond) {
+       ... // B1
+       ... // B2
+     } else {
+       ... // C
+     }
+     ... // D
+   }
+
+In :numref:`convergence-example-sequential-branch`, threads that are converged
+at the start of ``foo()`` are also converged at ``A1`` and ``A2``. Out of 
these,
+threads that evaluate ``cond`` to be ``true`` are converged at ``B1`` and
+``B2``. On the other hand, threads that evaluate ``cond`` to be ``false`` are
+converged at ``C``. All threads are finally converged at ``D`` when they reach
+there after finishing the ``if`` statement.
+
+.. code-block:: C++
+   :caption: Sequential execution in a loop
+   :name: convergence-example-sequential-loop
+
+   void foo() {
+     ... // A1
+     ... // A2
+     while (cond) {
+       ... // L1
+       ... // L2
+     }
+     ... // C
+   }
+
+In :numref:`convergence-example-sequential-loop`, threads that are converged at
+the start of ``foo()`` are converged at the start of the ``while`` loop and
+again at ``C``. But whether they are converged at the execution of statements
+inside the loop is determined by the rules for convergence inside iteration
+statements.
+
+Iteration Statement
+-------------------
+
+C++ expresses the semantics of the ``for`` statement and the ``ranged-for``
+statement in terms of the ``while`` statement. Similarly, convergence at
+different parts of these statements is defined as if that statement is replaced
+with the equivalent pattern using the ``while`` statement.
+
+An iteration statement ``S`` is said to be *reducible* if and only if for every
+label statement ``L`` that occurs inside ``S``, every ``goto`` or ``switch``
+statement that transfers control to ``L`` is also inside ``S``.
+
+The following rules apply to reducible iteration statements:
+
+- When two threads are converged at the execution of a ``do-while`` statement,
+  they are also converged at that first execution of the body substatement.
+- When two threads are converged at the execution of a ``while`` statement, 
they
+  are also converged at that first execution of the condition.
+- When two threads are converged at the execution of the condition, they are
+  also converged at the subsequent execution of the body substatement if they
+  both reach the body substatement.
+- When two threads are converged at the end of the body substatement, they are
+  also converged at the subsequent execution of the condition if they both 
reach
+  the condition.
+
+When an iteration statement ``S`` is not reducible, the convergence of threads
+at each substatement of ``S`` is implementation-defined.
+
+.. code-block:: C++
+   :caption: Iteration statement
+   :name: convergence-example-iteration-statement
+
+   void foo() {
+     ... // A1
+     ... // A2
+     while (cond1) {
+       ... // L1
+       if (cond2)
+         continue;
+       ... // L2
+       if (cond3)
+         break;
+       ... // L3
+     }
+     ... // C
+   }
+
+Consider the execution of the the function ``foo()`` shown in
+:numref:`convergence-example-iteration-statement`.
+
+- All threads that were converged at the start of ``foo()`` continue to be
+  converged at points ``A1`` and ``A2``.
+- Threads converged at ``A2`` for whom ``cond1`` evaluates to ``true`` execute
+  the loop body for the first time, and are converged at ``L1``.
+- Threads converged at ``L1`` for whom ``cond2`` evaluates to ``true`` transfer
+  control to the end of the loop body, while the remaining threads are 
converged
+  at ``L2``.
+- Threads converged at ``L2`` for whom ``cond3`` evaluates to ``true`` exit the
+  loop, while the remaining threads are converged at ``L3``.
+- All threads that were converged at the start of the loop body and did not 
exit
+  the loop body are converged at the end of the loop body, and at the 
subsequent
+  evaluation of ``cond1``.
+- All threads that were converged at the start of the ``while`` statement are
+  also converged at ``C``.
+
+.. code-block:: C++
+   :caption: Jump into loop
+   :name: convergence-example-jump-into-loop
+
+   void foo() {
+     ... // A
+     if (cond1)
+       goto inside_loop; // G1
+     ... // B
+     while (cond) {
+       ... // L1
+     inside_loop:
+       ... // L2
+       if (cond3) { // L3
+         ...        // L4
+         goto outside_loop; // G2
+       }
+       ... // L5
+     }
+     ... // C
+     outside_loop:
+     ... // D
+   }
+
+In :numref:`convergence-example-jump-into-loop`:
+
+- Convergence is implementation defined at the loop condition ``cond``, ``L1``,
+  ``L2``, ``L3``, and ``L5``.
+- Threads that are converged at ``L3`` are converged at ``L4`` and ``G2`` if
+  they enter the branch.
+- Threads that are converged at the start of the function are converged at
+  ``C``. This includes thread that jumped to ``inside_loop`` as well as threads
+  that reached the ``while`` loop after executing ``B``.
+- Threads that are converged at the start of the function are converged at
+  ``outside_loop``. This includes threads that jumped from ``G2`` as well as
+  threads that  reached ``outside_loop`` after executing ``C``.
+
+.. code-block:: C++
+   :caption: Duff's device
+   :name: convergence-example-duffs-device
+
+   void foo() {
+     ... // A
+     switch (value) {
+       case 1:
+         ... // C1
+         while (cond) {
+           ... // L
+           // note the fall-through
+       case 2:
+           ... // LC2
+         }
+         ... // C2
+         break;
+       case 3:
+         ... // C3
+     }
+     ... // D
+   }
+
+:numref:`convergence-example-duffs-device` shows how C++ allows the statements
+of a ``while`` loop to be interleaved with ``case`` labels of a ``switch``
+statement, resulting in irreducible control-flow.
+
+- Threads that are converged at the start of ``foo()`` are converged at the
+  start of the switch statement.
+- Convergence is implementation-defined at ``L`` and ``LC2``.
+- Threads that are converged at the start of the ``switch`` statement are
+  converged at ``C2`` if they reach ``C2``.
+- Threads that jump to ``case 3`` are converged at ``C3``.
+- Threads that are converged at the start of ``foo()`` are converged at ``D``.
+
+Jump Statements
+---------------
+
+A jump statement (i.e., ``goto`` or ``switch``) results in
+implementation-defined convergence only if it is a backwards jump or it
+transfers control into an iteration statement.
+
+- Whether two threads are converged at each statement in a ``goto`` cycle is
+  implementation-defined.
+- In a "straight-line jump" that does not jump into a loop, threads that make
+  the jump and threads that do not make the jump both converge at the target
+  label.
+
+.. code-block:: C++
+   :caption: Simple goto
+   :name: convergence-example-goto
+
+   void foo() {
+     ... // A
+     while (cond) {
+       ... // L1
+       if (cond)
+         goto label_X;
+       ... // L2
+     label_X: ...
+       ... // L3
+     }
+     ... // B
+   }
+
+Consider the execution of the the function ``foo()`` shown in
+:numref:`convergence-example-goto`.
+
+- Threads that are converged at ``L1`` are converged at ``L2`` if they reach
+  ``L2``.
+- Threads that are converged at ``L2`` are converged at ``label_X``.
+- Threads that are converged at the ``goto`` are converged at ``label_X``.
+- The body substatement contains ``label_X`` as well as every ``goto`` that
+  jumps to it, and is a compound statement that contains ``label_X``. Thus, all
+  threads that are converged at the start of the body substatement are 
converged
+  at ``label_X``. This includes the previous two sets of threads converged at
+  ``label_X``.
+- Threads that are converged at ``label_X`` are converged at ``L3``.
+
+.. code-block:: C++
+   :caption: Simple ``switch``
+   :name: convergence-example-switch
+
+   void foo() {
+     ... // A
+     switch (value) {
+       case 1:
+         ... // C1
+         break;
+       case 2:
+         ... // C2
+         [[fall_through]]
+       case 3:
+         ... // C3
+     }
+   }
+
+In :numref:`convergence-example-switch`, consider threads that are converged at
+the ``switch`` statement:
+
+- Threads that jump to ``case 1`` (respectively, ``case 2`` and ``case 3``) are
+  converged at ``C1`` (respectively, ``C2`` and ``C3``).
+- Threads that jump to ``case 2`` fall-through to ``case 3``. These threads
+  are converged with threads that directly jump to ``case 3``.
+
+.. code-block:: C++
+   :caption: Backwards ``goto``
+   :name: convergence-example-backwards-goto
+
+   void foo() {
+     ... // A
+     if (cond1)
+       goto inside_loop; // G1
+     ... // B
+     loop:
+       ... // L1
+     inside_loop:
+       ... // L2
+       if (cond3) { // L3
+         ... // L4
+         goto outside_loop; // G2
+       }
+       ... // L5
+     if (cond) // L6
+       goto loop; // G3
+     ... // C
+     outside_loop:
+     ... // D
+   }
+
+:numref:`convergence-example-backwards-goto` shows a cycle similar to the one 
in
+:numref:`convergence-example-jump-into-loop`, except this cycle is created by a
+backwards ``goto`` instead of a ``while`` statement.
+
+- The convergence of threads is implementation-defined in the span of the
+  ``goto`` statement ``G3``, which includes ``L1``, ``L2``, ``L3``, ``L5`` and
+  ``L6``.
+- Threads that are converged at ``L3`` are converged at ``L4`` and ``G2`` if
+  they enter the branch.
+- Threads that are converged at the start of the function are converged at
+  ``C``. This includes thread that jumped to ``inside_loop`` as well as threads
+  that reached ``loop`` after executing ``B``.
+- Threads that are converged at the start of the function are converged at
+  ``outside_loop``. This includes threads that jumped from ``G2`` as well as
+  threads that  reached ``outside_loop`` after executing ``C``.
+
+
+Implementation-defined Convergence
+==================================
+
+Implementation-defined convergence is in the context of each execution of a
+function body, corresponding to a distinct execution of a call to that 
function.
+An implementation may not converge two threads that enter the same function 
body
+by executing distinct calls to that function. If those two function calls were
+inlined, the resulting evaluations would correspond to distinct copies of the
+same expressions in the inlined function bodies. Note that
+implementation-defined convergence is still constrained in two ways:
+
+- The strict partial order imposed by *convergence-before*, and
+- The convergence at substatements inside a statement ``S`` imposed by
+  :ref:`sequential execution<convergence-sequential-execution>` on threads that
+  are converged at ``S``.
+
+`Maximal convergence in LLVM IR`__ is an example of implementation-defined
+convergence.
+
+__ https://llvm.org/docs/ConvergenceAndUniformity.html#maximal-convergence
+
+Limitation: Loops in LLVM IR
+============================
+
+Reference -- `Evolving "convergent": Lessons from Control Flow in AMDGPU
+<https://llvm.org/devmtg/2020-09/program/>`_ - Nicolai Haehnle, LLVM 
Developers'
+Meeting, October 2020.
+
+Ambiguity in a Simplified CFG
+-----------------------------
+
+The representation of loops in LLVM IR may lose information about the intended
+convergence in a program when the control-flow graph is simplified. This 
happens
+when loop structures in the language source that differ in the implied
+convergence, are considered equivalent in the CFG.
+
+.. code-block:: C++
+   :caption: Different loops with the same single-threaded execution
+   :name: convergence-ambiguity-source
+
+   void loop_continue() {
+     ... // A
+     for (;;) {
+       ... // B
+       if (cond1)
+         continue;
+       ... // C
+       if (cond2)
+         continue;
+       break;
+     }
+     ... // D
+   }
+
+   void loop_nest() {
+     ... // A
+     do {
+       do {
+         ... // B
+       } while (cond1);
+       ... // C
+     } while (cond2);
+     ... // D
+   }
+
+:numref:`convergence-ambiguity-source` shows two different loop statements that
+have identical semantics in a single-threaded environment. But in a
+multi-threaded environment, the convergence of threads is different for these
+two statements.
+
+In function ``loop_continue()``, threads that evaluate either ``cond1`` or
+``cond2`` to be ``true`` converge at the start of the ``for`` statement for the
+next iteration. An execution may produce the following example trace of
+converged evaluations.
+
+.. table::
+   :align: left
+
+   +----------+----+----+----+----+----+----+
+   |          | 1  | 2  | 3  | 4  | 5  | 6  |
+   +----------+----+----+----+----+----+----+
+   | Thread 1 | A1 | B1 |    | B3 | C1 | D1 |
+   +----------+----+----+----+----+----+----+
+   | Thread 2 | A2 | B2 | C2 | B4 | C3 | D2 |
+   +----------+----+----+----+----+----+----+
+
+But in function ``loop_nest()``, threads that evaluate ``cond1`` to be true
+continue to execute the inner ``do`` statement convergently until the condition
+becomes ``false``. All threads then proceed to execute ``C`` and then evaluate
+``cond2``. An equivalent execution produces the following different trace of
+converged evaluations.
+
+.. table::
+   :align: left
+
+   +----------+----+----+----+----+----+----+----+
+   |          | 1  | 2  | 3  | 4  | 5  | 6  | 7  |
+   +----------+----+----+----+----+----+----+----+
+   | Thread 1 | A1 | B1 | B3 | C1 |    |    | D1 |
+   +----------+----+----+----+----+----+----+----+
+   | Thread 2 | A2 | B2 |    | C2 | B4 | C3 | D2 |
+   +----------+----+----+----+----+----+----+----+
+
+But both loop statements can result in the same control-flow graph after
+simplification in the LLVM IR as shown in :numref:`convergence-ambiguity-cfg`,
+thus making convergence ambiguous in an optimizing compiler.
+
+.. code-block:: none
+   :caption: Canonicalized Loops
+   :name: convergence-ambiguity-cfg
+
+    +-----+
+    | A   |
+    +-+---+
+      |
+      v
+    +-----+
+    | B   |<---+
+    +-+-+-+    |
+      |  \-----+
+      v        |
+    +-----+    |
+    | C   |    |
+    +-+-+-+    |
+      |  \-----+
+      v
+    +-----+
+    | D   |
+    +-+---+
+
+SimplifyCFG in the LLVM optimizer is an example transform that can produce this
+canonicalization. This can be prevented if there was some way to a mark loop
+header that should not be merged into its predecessor or successor.
+
+One way to achieve this is to insert some operation with unknown side-effects 
so
+that the optimizer can no longer merge these blocks. But this is clearly a
+workaround for the fundamental problem that LLVM IR does not have sufficient
+semantics to represent convergence. A better solution is the use of 
`convergence
+control tokens`__ which are currently an experimental feature in LLVM IR.
+
+__ https://llvm.org/docs/ConvergentOperations.html
+
+Divergent Loop Exits
+--------------------
+
+.. code-block:: C++
+   :caption: Loop with a conditional break
+   :name: convergence-divergent-exit-source
+
+   void loop_continue() {
+     ... // A
+     for (...) {
+       ... // B
+       if (cond) {
+         ... // C
+         break;
+       }
+     }
+     ... // D
+   }
+
+:numref:`convergence-divergent-exit-source` shows an iteration statement with a
+``break`` that occurs inside a condition. When this condition is `divergent`__,
+different threads that are converged within the iteration statement execute
+``C`` on different iterations, and then reach ``D``. All such threads are
+converged at ``D``, but not at the respective execution of ``C`` in different
+iterations. An execution may produce the following example trace of
+converged evaluations.
+
+.. table::
+   :align: left
+
+   +----------+----+----+----+----+----+----+
+   |          | 1  | 2  | 3  | 4  | 5  | 6  |
+   +----------+----+----+----+----+----+----+
+   | Thread 1 | A1 | B1 |    | C1 |    | D1 |
+   +----------+----+----+----+----+----+----+
+   | Thread 2 | A2 | B2 | B4 |    | C2 | D2 |
+   +----------+----+----+----+----+----+----+
+
+__ https://llvm.org/docs/ConvergenceAndUniformity.html
+
+.. code-block:: none
+   :caption: Divergent loop exit in LLVM IR
+   :name: convergence-divergent-exit-cfg
+
+    +-----+
+    | A   |
+    +-+---+
+      |
+      v
+    +-----+
+    | B   |<---+
+    +-+-+-+    |
+      |  \-----+
+      v
+    +-----+
+    | C   |
+    +-+-+-+
+      |
+      v
+    +-----+
+    | D   |
+    +-+---+
+
+:numref:`convergence-divergent-exit-cfg` shows the resulting natural loop in
+LLVM IR, where this divergent execution of ``C`` is lost. In the LLVM optimizer
+and code generator, the block ``C`` is no longer part of the natural loop 
headed
+by ``B``, although it was lexically inside the corresponding iteration 
statement
+in the source code. As a result, the implementation causes all threads that 
exit
+the loop to converge at ``C``, when in fact they should converge at ``D``. An
+equivalent execution produces the following trace of converged evaluations.
+
+.. table::
+   :align: left
+
+   +----------+----+----+----+----+----+
+   |          | 1  | 2  | 3  | 4  | 5  |
+   +----------+----+----+----+----+----+
+   | Thread 1 | A1 | B1 |    | C1 | D1 |
+   +----------+----+----+----+----+----+
+   | Thread 2 | A2 | B2 | B4 | C2 | D2 |
+   +----------+----+----+----+----+----+
+
+The only way to represent this correctly is using the experimental feature for
+`convergence control tokens`__.
+
+__ https://llvm.org/docs/ConvergentOperations.html
diff --git a/clang/docs/conf.py b/clang/docs/conf.py
index 4cee382a718fa..8cbf7e06b5579 100644
--- a/clang/docs/conf.py
+++ b/clang/docs/conf.py
@@ -94,6 +94,10 @@
 .. |ReleaseNotesTitle| replace:: {in_progress_title} Release Notes
 """
 
+# -- General options for output 
------------------------------------------------
+
+numfig = True
+
 # -- Options for HTML output 
---------------------------------------------------
 
 # The theme to use for HTML and HTML Help pages.  See the documentation for
diff --git a/clang/docs/index.rst b/clang/docs/index.rst
index 6c792af66a62c..87cfada71d105 100644
--- a/clang/docs/index.rst
+++ b/clang/docs/index.rst
@@ -56,6 +56,7 @@ Using Clang as a Compiler
    OpenMPSupport
    SYCLSupport
    HIPSupport
+   ThreadConvergence
    HLSL/HLSLDocs
    ThinLTO
    APINotes
diff --git a/clang/include/clang/AST/ParentMap.h 
b/clang/include/clang/AST/ParentMap.h
index 86e2f048a3445..63e9be262f08f 100644
--- a/clang/include/clang/AST/ParentMap.h
+++ b/clang/include/clang/AST/ParentMap.h
@@ -13,6 +13,8 @@
 #ifndef LLVM_CLANG_AST_PARENTMAP_H
 #define LLVM_CLANG_AST_PARENTMAP_H
 
+#include <utility>
+
 namespace clang {
 class Stmt;
 class Expr;
@@ -23,17 +25,21 @@ class ParentMap {
   ParentMap(Stmt* ASTRoot);
   ~ParentMap();
 
+  using ValueT = std::pair<Stmt *, unsigned>;
+
   /// Adds and/or updates the parent/child-relations of the complete
   /// stmt tree of S. All children of S including indirect descendants are
   /// visited and updated or inserted but not the parents of S.
-  void addStmt(Stmt* S);
+  void addStmt(Stmt *S, unsigned Depth);
 
   /// Manually sets the parent of \p S to \p Parent.
   ///
   /// If \p S is already in the map, this method will update the mapping.
   void setParent(const Stmt *S, const Stmt *Parent);
 
+  ValueT lookup(Stmt *) const;
   Stmt *getParent(Stmt*) const;
+  unsigned getParentDepth(Stmt *) const;
   Stmt *getParentIgnoreParens(Stmt *) const;
   Stmt *getParentIgnoreParenCasts(Stmt *) const;
   Stmt *getParentIgnoreParenImpCasts(Stmt *) const;
@@ -43,6 +49,10 @@ class ParentMap {
     return getParent(const_cast<Stmt*>(S));
   }
 
+  unsigned getParentDepth(const Stmt *S) const {
+    return getParentDepth(const_cast<Stmt *>(S));
+  }
+
   const Stmt *getParentIgnoreParens(const Stmt *S) const {
     return getParentIgnoreParens(const_cast<Stmt*>(S));
   }
@@ -60,5 +70,5 @@ class ParentMap {
   }
 };
 
-} // end clang namespace
+} // end namespace clang
 #endif
diff --git a/clang/include/clang/Analysis/Analyses/ConvergenceCheck.h 
b/clang/include/clang/Analysis/Analyses/ConvergenceCheck.h
new file mode 100644
index 0000000000000..bf0d164c6a5bc
--- /dev/null
+++ b/clang/include/clang/Analysis/Analyses/ConvergenceCheck.h
@@ -0,0 +1,25 @@
+//===----------------------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// Analyse implicit convergence in the CFG.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_CLANG_ANALYSIS_ANALYSES_CONVERGENCECHECK_H
+#define LLVM_CLANG_ANALYSIS_ANALYSES_CONVERGENCECHECK_H
+
+namespace clang {
+class AnalysisDeclContext;
+class Sema;
+class Stmt;
+
+void analyzeForConvergence(Sema &S, AnalysisDeclContext &AC);
+
+} // end namespace clang
+
+#endif // LLVM_CLANG_ANALYSIS_ANALYSES_CONVERGENCECHECK_H
diff --git a/clang/include/clang/Analysis/CFG.h 
b/clang/include/clang/Analysis/CFG.h
index a7ff38c786a8f..b356841243e69 100644
--- a/clang/include/clang/Analysis/CFG.h
+++ b/clang/include/clang/Analysis/CFG.h
@@ -139,6 +139,8 @@ class CFGStmt : public CFGElement {
     return static_cast<const Stmt *>(Data1.getPointer());
   }
 
+  Stmt *getStmt() { return static_cast<Stmt *>(Data1.getPointer()); }
+
 private:
   friend class CFGElement;
 
diff --git a/clang/include/clang/Basic/AttrDocs.td 
b/clang/include/clang/Basic/AttrDocs.td
index 29d1d2c69e824..5f37922d352b7 100644
--- a/clang/include/clang/Basic/AttrDocs.td
+++ b/clang/include/clang/Basic/AttrDocs.td
@@ -1675,15 +1675,13 @@ of the condition.
 def ConvergentDocs : Documentation {
   let Category = DocCatFunction;
   let Content = [{
-The ``convergent`` attribute can be placed on a function declaration. It is
-translated into the LLVM ``convergent`` attribute, which indicates that the 
call
-instructions of a function with this attribute cannot be made control-dependent
-on any additional values.
-
-This attribute is different from ``noduplicate`` because it allows duplicating
-function calls if it can be proved that the duplicated function calls are
-not made control-dependent on any additional values, e.g., unrolling a loop
-executed by all work items.
+The ``convergent`` attribute can be placed on a function declaration to 
indicate
+that every call to this function should be treated as a
+:ref:`convergent operation<convergent-operation>`.
+
+This attribute is different from ``noduplicate``. In general, ``convergent``
+calls can be duplicated if each copy retains the threads that are converged at
+each execution of the original call.
 
 Sample usage:
 
diff --git a/clang/include/clang/Basic/DiagnosticGroups.td 
b/clang/include/clang/Basic/DiagnosticGroups.td
index b234d60fee8fc..bcf0bb6d3b8c6 100644
--- a/clang/include/clang/Basic/DiagnosticGroups.td
+++ b/clang/include/clang/Basic/DiagnosticGroups.td
@@ -1504,6 +1504,9 @@ def HIPOpenMPOffloading: 
DiagGroup<"hip-omp-target-directives">;
 // Warning about multiple GPUs are detected.
 def MultiGPU: DiagGroup<"multi-gpu">;
 
+// A warning group for thread convergence.
+def Convergence : DiagGroup<"convergence">;
+
 // Warnings which cause linking of the runtime libraries like
 // libc and the CRT to be skipped.
 def AVRRtlibLinkingQuirks : DiagGroup<"avr-rtlib-linking-quirks">;
diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td 
b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index 6cbe8b60fe9bf..dabb6d31b519a 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -6510,6 +6510,10 @@ def warn_unused_label : Warning<"unused label %0">,
 def err_continue_from_cond_var_init : Error<
   "cannot jump from this continue statement to the loop increment; "
   "jump bypasses initialization of loop condition variable">;
+def note_goto_affects_convergence : Note<
+  "jump from this goto statement affects convergence">;
+def note_switch_case_affects_convergence : Note<
+  "jump to this case statement affects convergence of loop">;
 def err_goto_into_protected_scope : Error<
   "cannot jump from this goto statement to its label">;
 def ext_goto_into_protected_scope : ExtWarn<
@@ -10513,6 +10517,12 @@ def warn_loop_ctrl_binds_to_inner : Warning<
 def err_omp_bind_required_on_loop : Error<
   "expected 'bind' clause for 'loop' construct without an enclosing OpenMP "
   "construct">;
+def warn_loop_side_entry_affects_convergence : Warning<
+  "jump enters an iteration statement; convergence is implementation-defined">,
+  InGroup<Convergence>, DefaultIgnore;
+def warn_cycle_created_by_goto_affects_convergence : Warning<
+  "convergence is implementation-defined due to a backwards goto">,
+  InGroup<Convergence>, DefaultIgnore;
 def err_omp_loop_reduction_clause : Error<
   "'reduction' clause not allowed with '#pragma omp loop bind(teams)'">;
 def warn_break_binds_to_switch : Warning<
diff --git a/clang/lib/AST/ParentMap.cpp b/clang/lib/AST/ParentMap.cpp
index e62e71bf5a514..7b8a970a99cff 100644
--- a/clang/lib/AST/ParentMap.cpp
+++ b/clang/lib/AST/ParentMap.cpp
@@ -13,19 +13,20 @@
 #include "clang/AST/ParentMap.h"
 #include "clang/AST/Decl.h"
 #include "clang/AST/Expr.h"
+#include "clang/AST/ExprCXX.h"
 #include "clang/AST/StmtObjC.h"
 #include "llvm/ADT/DenseMap.h"
 
 using namespace clang;
 
-typedef llvm::DenseMap<Stmt*, Stmt*> MapTy;
+typedef llvm::DenseMap<Stmt *, ParentMap::ValueT> MapTy;
 
 enum OpaqueValueMode {
   OV_Transparent,
   OV_Opaque
 };
 
-static void BuildParentMap(MapTy& M, Stmt* S,
+static void BuildParentMap(MapTy &M, Stmt *S, unsigned ParentDepth,
                            OpaqueValueMode OVMode = OV_Transparent) {
   if (!S)
     return;
@@ -35,23 +36,23 @@ static void BuildParentMap(MapTy& M, Stmt* S,
     PseudoObjectExpr *POE = cast<PseudoObjectExpr>(S);
     Expr *SF = POE->getSyntacticForm();
 
-    auto [Iter, Inserted] = M.try_emplace(SF, S);
+    auto [Iter, Inserted] = M.try_emplace(SF, S, ParentDepth);
     if (!Inserted) {
       // Nothing more to do in opaque mode if we are updating an existing map.
       if (OVMode == OV_Opaque)
         break;
       // Update the entry in transparent mode, and clear existing state.
-      Iter->second = S;
+      Iter->second = {S, ParentDepth};
       for (Stmt *SubStmt : S->children())
         M.erase(SubStmt);
     }
-    BuildParentMap(M, SF, OV_Transparent);
+    BuildParentMap(M, SF, ParentDepth + 1, OV_Transparent);
 
     for (PseudoObjectExpr::semantics_iterator I = POE->semantics_begin(),
                                               E = POE->semantics_end();
          I != E; ++I) {
-      M[*I] = S;
-      BuildParentMap(M, *I, OV_Opaque);
+      M[*I] = {S, ParentDepth + 1};
+      BuildParentMap(M, *I, ParentDepth + 1, OV_Opaque);
     }
     break;
   }
@@ -59,17 +60,17 @@ static void BuildParentMap(MapTy& M, Stmt* S,
     assert(OVMode == OV_Transparent && "Should not appear alongside OVEs");
     BinaryConditionalOperator *BCO = cast<BinaryConditionalOperator>(S);
 
-    M[BCO->getCommon()] = S;
-    BuildParentMap(M, BCO->getCommon(), OV_Transparent);
+    M[BCO->getCommon()] = {S, ParentDepth + 1};
+    BuildParentMap(M, BCO->getCommon(), ParentDepth + 1, OV_Transparent);
 
-    M[BCO->getCond()] = S;
-    BuildParentMap(M, BCO->getCond(), OV_Opaque);
+    M[BCO->getCond()] = {S, ParentDepth + 1};
+    BuildParentMap(M, BCO->getCond(), ParentDepth + 1, OV_Opaque);
 
-    M[BCO->getTrueExpr()] = S;
-    BuildParentMap(M, BCO->getTrueExpr(), OV_Opaque);
+    M[BCO->getTrueExpr()] = {S, ParentDepth + 1};
+    BuildParentMap(M, BCO->getTrueExpr(), ParentDepth + 1, OV_Opaque);
 
-    M[BCO->getFalseExpr()] = S;
-    BuildParentMap(M, BCO->getFalseExpr(), OV_Transparent);
+    M[BCO->getFalseExpr()] = {S, ParentDepth + 1};
+    BuildParentMap(M, BCO->getFalseExpr(), ParentDepth + 1, OV_Transparent);
 
     break;
   }
@@ -81,33 +82,33 @@ static void BuildParentMap(MapTy& M, Stmt* S,
     // parent, then not reassign that when traversing the semantic expressions.
     OpaqueValueExpr *OVE = cast<OpaqueValueExpr>(S);
     Expr *SrcExpr = OVE->getSourceExpr();
-    auto [Iter, Inserted] = M.try_emplace(SrcExpr, S);
+    auto [Iter, Inserted] = M.try_emplace(SrcExpr, S, ParentDepth);
     // Force update in transparent mode.
     if (!Inserted && OVMode == OV_Transparent) {
-      Iter->second = S;
+      Iter->second = {S, ParentDepth};
       Inserted = true;
     }
     if (Inserted)
-      BuildParentMap(M, SrcExpr, OV_Transparent);
+      BuildParentMap(M, SrcExpr, ParentDepth + 1, OV_Transparent);
     break;
   }
   case Stmt::CapturedStmtClass:
     for (Stmt *SubStmt : S->children()) {
       if (SubStmt) {
-        M[SubStmt] = S;
-        BuildParentMap(M, SubStmt, OVMode);
+        M[SubStmt] = {S, ParentDepth + 1};
+        BuildParentMap(M, SubStmt, ParentDepth + 1, OVMode);
       }
     }
     if (Stmt *SubStmt = cast<CapturedStmt>(S)->getCapturedStmt()) {
-      M[SubStmt] = S;
-      BuildParentMap(M, SubStmt, OVMode);
+      M[SubStmt] = {S, ParentDepth + 1};
+      BuildParentMap(M, SubStmt, ParentDepth + 1, OVMode);
     }
     break;
   default:
     for (Stmt *SubStmt : S->children()) {
       if (SubStmt) {
-        M[SubStmt] = S;
-        BuildParentMap(M, SubStmt, OVMode);
+        M[SubStmt] = {S, ParentDepth + 1};
+        BuildParentMap(M, SubStmt, ParentDepth + 1, OVMode);
       }
     }
     break;
@@ -117,7 +118,7 @@ static void BuildParentMap(MapTy& M, Stmt* S,
 ParentMap::ParentMap(Stmt *S) : Impl(nullptr) {
   if (S) {
     MapTy *M = new MapTy();
-    BuildParentMap(*M, S);
+    BuildParentMap(*M, S, 0);
     Impl = M;
   }
 }
@@ -126,9 +127,9 @@ ParentMap::~ParentMap() {
   delete (MapTy*) Impl;
 }
 
-void ParentMap::addStmt(Stmt* S) {
+void ParentMap::addStmt(Stmt *S, unsigned Depth) {
   if (S) {
-    BuildParentMap(*(MapTy*) Impl, S);
+    BuildParentMap(*(MapTy *)Impl, S, Depth);
   }
 }
 
@@ -136,14 +137,19 @@ void ParentMap::setParent(const Stmt *S, const Stmt 
*Parent) {
   assert(S);
   assert(Parent);
   MapTy *M = reinterpret_cast<MapTy *>(Impl);
-  M->insert(std::make_pair(const_cast<Stmt *>(S), const_cast<Stmt *>(Parent)));
+  M->try_emplace(const_cast<Stmt *>(S), const_cast<Stmt *>(Parent),
+                 getParentDepth(Parent) + 1);
 }
 
-Stmt* ParentMap::getParent(Stmt* S) const {
+ParentMap::ValueT ParentMap::lookup(Stmt *S) const {
   MapTy* M = (MapTy*) Impl;
   return M->lookup(S);
 }
 
+Stmt *ParentMap::getParent(Stmt *S) const { return lookup(S).first; }
+
+unsigned ParentMap::getParentDepth(Stmt *S) const { return lookup(S).second; }
+
 Stmt *ParentMap::getParentIgnoreParens(Stmt *S) const {
   do {
     S = getParent(S);
@@ -221,4 +227,3 @@ bool ParentMap::isConsumedExpr(Expr* E) const {
       return true;
   }
 }
-
diff --git a/clang/lib/Analysis/AnalysisDeclContext.cpp 
b/clang/lib/Analysis/AnalysisDeclContext.cpp
index d0b663bd94580..8214daee89324 100644
--- a/clang/lib/Analysis/AnalysisDeclContext.cpp
+++ b/clang/lib/Analysis/AnalysisDeclContext.cpp
@@ -280,7 +280,7 @@ ParentMap &AnalysisDeclContext::getParentMap() {
     PM.reset(new ParentMap(getBody()));
     if (const auto *C = dyn_cast<CXXConstructorDecl>(getDecl())) {
       for (const auto *I : C->inits()) {
-        PM->addStmt(I->getInit());
+        PM->addStmt(I->getInit(), 0);
       }
     }
     if (builtCFG)
diff --git a/clang/lib/Analysis/CMakeLists.txt 
b/clang/lib/Analysis/CMakeLists.txt
index 8cd3990db4c3e..254fe4fde3ff6 100644
--- a/clang/lib/Analysis/CMakeLists.txt
+++ b/clang/lib/Analysis/CMakeLists.txt
@@ -19,6 +19,7 @@ add_clang_library(clangAnalysis
   Dominators.cpp
   ExprMutationAnalyzer.cpp
   FixitUtil.cpp
+  ConvergenceCheck.cpp
   IntervalPartition.cpp
   IssueHash.cpp
   LiveVariables.cpp
diff --git a/clang/lib/Analysis/ConvergenceCheck.cpp 
b/clang/lib/Analysis/ConvergenceCheck.cpp
new file mode 100644
index 0000000000000..75139388ea19e
--- /dev/null
+++ b/clang/lib/Analysis/ConvergenceCheck.cpp
@@ -0,0 +1,119 @@
+//===----------------------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "clang/Analysis/Analyses/ConvergenceCheck.h"
+#include "clang/AST/ParentMap.h"
+#include "clang/AST/Stmt.h"
+#include "clang/Analysis/AnalysisDeclContext.h"
+#include "clang/Sema/SemaInternal.h"
+#include "llvm/ADT/DepthFirstIterator.h"
+
+using namespace clang;
+using namespace llvm;
+
+static void warnGotoCycle(Sema &S, Stmt *From, Stmt *Parent) {
+  S.Diag(Parent->getBeginLoc(),
+         diag::warn_cycle_created_by_goto_affects_convergence);
+  S.Diag(From->getBeginLoc(), diag::note_goto_affects_convergence);
+}
+
+static void warnJumpIntoLoop(Sema &S, Stmt *From, Stmt *Loop) {
+  S.Diag(Loop->getBeginLoc(), diag::warn_loop_side_entry_affects_convergence);
+  S.Diag(From->getBeginLoc(), diag::note_goto_affects_convergence);
+}
+
+static void checkConvergenceOnGoto(Sema &S, GotoStmt *From, ParentMap &PM) {
+  Stmt *To = From->getLabel()->getStmt();
+
+  unsigned ToDepth = PM.getParentDepth(To) + 1;
+  unsigned FromDepth = PM.getParentDepth(From) + 1;
+  Stmt *ExpandedTo = To;
+  Stmt *ExpandedFrom = From;
+  while (ToDepth > FromDepth) {
+    std::tie(ExpandedTo, ToDepth) = PM.lookup(ExpandedTo);
+  }
+  while (FromDepth > ToDepth) {
+    std::tie(ExpandedFrom, FromDepth) = PM.lookup(ExpandedFrom);
+  }
+
+  // Special case: the goto statement is a descendant of the label statement.
+  if (ExpandedFrom == ExpandedTo) {
+    assert(ExpandedTo == To);
+    warnGotoCycle(S, From, To);
+    return;
+  }
+
+  Stmt *ParentFrom = PM.getParent(ExpandedFrom);
+  Stmt *ParentTo = PM.getParent(ExpandedTo);
+  while (ParentFrom != ParentTo) {
+    assert(ParentFrom && ParentTo);
+    ExpandedFrom = ParentFrom;
+    ParentFrom = PM.getParent(ExpandedFrom);
+    ExpandedTo = ParentTo;
+    ParentTo = PM.getParent(ExpandedTo);
+  }
+
+  SmallVector<Stmt *> Loops;
+  for (Stmt *I = To; I != ParentFrom; I = PM.getParent(I)) {
+    // Can't jump into a ranged-for, so we don't need to look for it here.
+    if (isa<ForStmt, WhileStmt, DoStmt>(I))
+      Loops.push_back(I);
+  }
+  for (Stmt *I : reverse(Loops))
+    warnJumpIntoLoop(S, From, I);
+
+  bool ToFoundFirst = false;
+  for (Stmt *Child : ParentFrom->children()) {
+    if (Child == ExpandedFrom)
+      break;
+    if (Child == ExpandedTo) {
+      ToFoundFirst = true;
+      break;
+    }
+  }
+
+  if (ToFoundFirst) {
+    warnGotoCycle(S, From, To);
+  }
+}
+
+static void warnSwitchIntoLoop(Sema &S, Stmt *Case, Stmt *Loop) {
+  S.Diag(Loop->getBeginLoc(), diag::warn_loop_side_entry_affects_convergence);
+  S.Diag(Case->getBeginLoc(), diag::note_switch_case_affects_convergence);
+}
+
+static void checkConvergenceForSwitch(Sema &S, SwitchStmt *Switch,
+                                      ParentMap &PM) {
+  for (SwitchCase *Case = Switch->getSwitchCaseList(); Case;
+       Case = Case->getNextSwitchCase()) {
+    SmallVector<Stmt *> Loops;
+    for (Stmt *I = Case; I != Switch; I = PM.getParent(I)) {
+      // Can't jump into a ranged-for, so we don't need to look for it here.
+      if (isa<ForStmt, WhileStmt, DoStmt>(I))
+        Loops.push_back(I);
+    }
+    for (Stmt *I : reverse(Loops))
+      warnSwitchIntoLoop(S, Case, I);
+  }
+}
+
+void clang::analyzeForConvergence(Sema &S, AnalysisDeclContext &AC) {
+  // Iterating over the CFG helps trim unreachable blocks, and locates Goto
+  // statements faster than iterating over the whole body.
+  CFG *cfg = AC.getCFG();
+  assert(cfg);
+  ParentMap &PM = AC.getParentMap();
+  for (CFGBlock *BI : *cfg) {
+    Stmt *Term = BI->getTerminatorStmt();
+    if (GotoStmt *Goto = dyn_cast_or_null<GotoStmt>(Term)) {
+      checkConvergenceOnGoto(S, Goto, PM);
+    } else if (SwitchStmt *Switch = dyn_cast_or_null<SwitchStmt>(Term)) {
+      checkConvergenceForSwitch(S, Switch, PM);
+    }
+  }
+}
diff --git a/clang/lib/Sema/AnalysisBasedWarnings.cpp 
b/clang/lib/Sema/AnalysisBasedWarnings.cpp
index 34045a7274021..31756d3a2f75a 100644
--- a/clang/lib/Sema/AnalysisBasedWarnings.cpp
+++ b/clang/lib/Sema/AnalysisBasedWarnings.cpp
@@ -29,6 +29,7 @@
 #include "clang/Analysis/Analyses/CFGReachabilityAnalysis.h"
 #include "clang/Analysis/Analyses/CalledOnceCheck.h"
 #include "clang/Analysis/Analyses/Consumed.h"
+#include "clang/Analysis/Analyses/ConvergenceCheck.h"
 #include "clang/Analysis/Analyses/ReachableCode.h"
 #include "clang/Analysis/Analyses/ThreadSafety.h"
 #include "clang/Analysis/Analyses/UninitializedValues.h"
@@ -2666,7 +2667,7 @@ void clang::sema::AnalysisBasedWarnings::IssueWarnings(
     return;
   }
 
-  const Stmt *Body = D->getBody();
+  Stmt *Body = D->getBody();
   assert(Body);
 
   // Construct the analysis context with the specified CFG build options.
@@ -2865,6 +2866,10 @@ void clang::sema::AnalysisBasedWarnings::IssueWarnings(
       if (S.getLangOpts().CPlusPlus && !fscope->isCoroutine() && 
isNoexcept(FD))
         checkThrowInNonThrowingFunc(S, FD, AC);
 
+  if (!Diags.isIgnored(diag::warn_cycle_created_by_goto_affects_convergence,
+                       D->getBeginLoc()))
+    analyzeForConvergence(S, AC);
+
   // If none of the previous checks caused a CFG build, trigger one here
   // for the logical error handler.
   if (LogicalErrorHandler::hasActiveDiagnostics(Diags, D->getBeginLoc())) {
diff --git a/clang/test/SemaHIP/convergence-warnings.hip 
b/clang/test/SemaHIP/convergence-warnings.hip
new file mode 100644
index 0000000000000..a0833a4cfdcb7
--- /dev/null
+++ b/clang/test/SemaHIP/convergence-warnings.hip
@@ -0,0 +1,473 @@
+// REQUIRES: amdgpu-registered-target
+// RUN: %clang_cc1 -fsyntax-only -verify -triple amdgcn -Wconvergence %s
+
+#define __device__ __attribute__((device))
+
+__device__ void foo() __attribute__((convergent));
+
+__device__ int jump_into_nest(int cond) {
+  int i = 0;
+
+  if (cond > 1) {
+    goto jumptarget; // #GOTO11
+  }
+
+  foo();
+
+  while (true) { // expected-warning {{jump enters an iteration statement; 
convergence is implementation-defined}}
+                 // expected-note@#GOTO11 {{jump from this goto statement 
affects convergence}}
+    foo();
+    if (cond > 1) {
+      do { // expected-warning {{jump enters an iteration statement; 
convergence is implementation-defined}}
+           // expected-note@#GOTO11 {{jump from this goto statement affects 
convergence}}
+        i++;
+        foo();
+      jumptarget:
+        i++;
+      } while (true);
+      foo();
+    }
+  }
+
+  return i;
+}
+
+__device__ int jump_into_unreachable_nest(int cond) {
+  int i = 0;
+
+  foo();
+
+  goto jumptarget; // #GOTO43
+
+  while (true) { // expected-warning {{jump enters an iteration statement; 
convergence is implementation-defined}}
+                 // expected-note@#GOTO43 {{jump from this goto statement 
affects convergence}}
+    foo();
+    if (cond > 1) {
+      do { // expected-warning {{jump enters an iteration statement; 
convergence is implementation-defined}}
+           // expected-note@#GOTO43 {{jump from this goto statement affects 
convergence}}
+        i++;
+        foo();
+      jumptarget:
+        i++;
+      } while (true);
+      foo();
+    }
+  }
+
+  return i;
+}
+
+__device__ int backwards_jump_into_nest(int cond) {
+  int i = 0;
+
+  while (true) {// expected-warning {{jump enters an iteration statement; 
convergence is implementation-defined}}
+                // expected-note@#GOTO47 {{jump from this goto statement 
affects convergence}}
+    foo();
+    if (cond > 1) {
+      do {// expected-warning {{jump enters an iteration statement; 
convergence is implementation-defined}}
+          // expected-note@#GOTO47 {{jump from this goto statement affects 
convergence}}
+        i++;
+        foo();
+      jumptarget:
+        // expected-warning@-1 {{convergence is implementation-defined due to 
a backwards goto}}
+        // expected-note@#GOTO47 {{jump from this goto statement affects 
convergence}}
+        i++;
+      } while (true);
+      foo();
+    }
+  }
+
+  if (cond > 1) {
+    goto jumptarget; // #GOTO47
+  }
+
+  return i;
+}
+
+__device__ int forward_all_kinds(int cond1, int cond2, int cond3, int cond4, 
int cond5) {
+  int i = 0;
+
+  if (cond1 < 0)
+    goto jumptarget; // #GOTO57
+
+  for (; i < cond5;) {// expected-warning {{jump enters an iteration 
statement; convergence is implementation-defined}}
+            // expected-note@#GOTO57 {{jump from this goto statement affects 
convergence}}
+    foo();
+    if (cond2 != 0) {
+      while (i < cond3) { // expected-warning {{jump enters an iteration 
statement; convergence is implementation-defined}}
+                     // expected-note@#GOTO57 {{jump from this goto statement 
affects convergence}}
+        foo();
+        if (cond4 > 1) {
+          do { // expected-warning {{jump enters an iteration statement; 
convergence is implementation-defined}}
+               // expected-note@#GOTO57 {{jump from this goto statement 
affects convergence}}
+            foo();
+            i++;
+          jumptarget:
+            i++;
+          } while (i < cond4);
+        }
+      }
+    } else {
+      while (i < cond3) {
+        foo();
+        i++;
+      }
+    }
+  }
+  return i;
+}
+
+__device__ int forever_loops(int cond1, int cond2, int cond3) {
+  int i = 0;
+
+  if (cond1 < 0)
+    goto jumptarget; // #GOTO105
+
+  for (;;) {// expected-warning {{jump enters an iteration statement; 
convergence is implementation-defined}}
+            // expected-note@#GOTO105 {{jump from this goto statement affects 
convergence}}
+    foo();
+    while (true) { // expected-warning {{jump enters an iteration statement; 
convergence is implementation-defined}}
+                     // expected-note@#GOTO105 {{jump from this goto statement 
affects convergence}}
+    jumptarget:
+      i++;
+      foo();
+    }
+  }
+  return i;
+}
+
+__device__ int nest_to_nest(int cond) {
+  int i = 0;
+
+  for (;;) {
+    foo();
+    if (cond != 0) {
+      while (true) {
+        foo();
+        if (cond > 1) {
+          goto jumptarget; // #GOTO89
+        }
+      }
+    }
+  }
+
+  while (true) { // expected-warning {{jump enters an iteration statement; 
convergence is implementation-defined}}
+                 // expected-note@#GOTO89 {{jump from this goto statement 
affects convergence}}
+    foo();
+    if (cond > 1) {
+      do { // expected-warning {{jump enters an iteration statement; 
convergence is implementation-defined}}
+           // expected-note@#GOTO89 {{jump from this goto statement affects 
convergence}}
+        foo();
+        i++;
+      jumptarget:
+        i++;
+      } while (true);
+    }
+  }
+
+  return i;
+}
+
+__device__ int backwards_nest_to_nest(int cond) {
+  int i = 0;
+
+  while (true) { // expected-warning {{jump enters an iteration statement; 
convergence is implementation-defined}}
+                 // expected-note@#GOTO131 {{jump from this goto statement 
affects convergence}}
+    foo();
+    if (cond > 1) {
+      do { // expected-warning {{jump enters an iteration statement; 
convergence is implementation-defined}}
+           // expected-note@#GOTO131 {{jump from this goto statement affects 
convergence}}
+        foo();
+        i++;
+      jumptarget:
+        // expected-warning@-1 {{convergence is implementation-defined due to 
a backwards goto}}
+        // expected-note@#GOTO131 {{jump from this goto statement affects 
convergence}}
+        i++;
+      } while (true);
+    }
+  }
+
+  for (;;) {
+    foo();
+    if (cond != 0) {
+      while (true) {
+        foo();
+        if (cond > 1) {
+          goto jumptarget; // #GOTO131
+        }
+      }
+    }
+  }
+
+  return i;
+}
+
+__device__ int backwards_and_out(int cond) {
+  int i = 0;
+
+  if (cond > 1) {
+    foo();
+  jumptarget:
+    // expected-warning@-1 {{convergence is implementation-defined due to a 
backwards goto}}
+    // expected-note@#GOTO154 {{jump from this goto statement affects 
convergence}}
+    foo();
+    i++;
+  }
+
+  for (;;) {
+    foo();
+    if (cond != 0) {
+      while (true) {
+        foo();
+        if (cond > 1) {
+          goto jumptarget; // #GOTO154
+        }
+      }
+    }
+  }
+
+  return i;
+}
+
+__device__ int backwards_inside_loop(int cond) {
+  int i = 0;
+  int j = 0;
+  int k = 0;
+  for (;;) {
+    foo();
+    i++;
+  jumptarget:
+    // expected-warning@-1 {{convergence is implementation-defined due to a 
backwards goto}}
+    // expected-note@#GOTO175 {{jump from this goto statement affects 
convergence}}
+    foo();
+    j++;
+    k++;
+    if (cond > 5)
+      goto jumptarget; // #GOTO175
+  }
+  return i + j + k;
+}
+
+__device__ int loop_backwards_loop(int cond) {
+  int i = 0;
+  int j = 0;
+  int k = 0;
+  for (;;) {
+    foo();
+    i++;
+  jumptarget:
+    // expected-warning@-1 {{convergence is implementation-defined due to a 
backwards goto}}
+    // expected-note@#GOTO193 {{jump from this goto statement affects 
convergence}}
+    j++;
+    while (true) {
+      foo();
+      k++;
+      if (cond > 5)
+        goto jumptarget; // #GOTO193
+    }
+  }
+  return i + j + k;
+}
+
+__device__ int backwards_inside_label(int cond) {
+  int i = 0;
+  int j = 0;
+  jumptarget: {
+  // expected-warning@-1 {{convergence is implementation-defined due to a 
backwards goto}}
+  // expected-note@#GOTO208 {{jump from this goto statement affects 
convergence}}
+    foo();
+    i++;
+    j++;
+    if (cond > 5)
+      goto jumptarget; // #GOTO208
+  }
+  return i + j;
+}
+
+__device__ int backwards_label_inside_branch(int cond) {
+  int i = 0;
+  int j = 0;
+  if (cond > 0) {
+  jumptarget: {
+  // expected-warning@-1 {{convergence is implementation-defined due to a 
backwards goto}}
+  // expected-note@#GOTO223 {{jump from this goto statement affects 
convergence}}
+      foo();
+      i++;
+      j++;
+      if (cond > 5)
+        goto jumptarget; // #GOTO223
+    }
+  } else {
+    j++;
+  }
+  return i + j;
+}
+
+__device__ int backwards_inside_labelled_loop(int cond) {
+  int i = 0;
+  int j = 0;
+ jumptarget: while (true) {
+  // expected-warning@-1 {{convergence is implementation-defined due to a 
backwards goto}}
+  // expected-note@#GOTO240 {{jump from this goto statement affects 
convergence}}
+    foo();
+    i++;
+    j++;
+    if (cond > 5)
+      goto jumptarget; // #GOTO240
+  }
+  return i + j;
+}
+
+// Convergence is well-defined for a jump within a switch statement.
+__device__ int switch_sideways(int cond) {
+  int i = 0;
+  switch (cond) {
+  case 10:
+    foo();
+    i++;
+    break;
+  case 20:
+    foo();
+    i += 2;
+    goto jumptarget;
+    break;
+  case 30: {
+    foo();
+    i += 3;
+  jumptarget:
+    foo();
+    i += 4;
+    break;
+  }
+  default:
+    foo();
+    break;
+  }
+  return i;
+}
+
+// Convergence is implementation defined for a backward jump within a switch
+// statement.
+__device__ int switch_backwards(int cond) {
+  int i = 0;
+  switch (cond) {
+  case 10:
+    foo();
+    i++;
+    break;
+  case 30: {
+    foo();
+    i += 3;
+  jumptarget:
+    // expected-warning@-1 {{convergence is implementation-defined due to a 
backwards goto}}
+    // expected-note@#GOTO286 {{jump from this goto statement affects 
convergence}}
+    foo();
+    i += 4;
+    break;
+  }
+  case 20:
+    foo();
+    i += 2;
+    goto jumptarget; // #GOTO286
+    break;
+  default:
+    foo();
+    break;
+  }
+  return i;
+}
+
+// Convergence is well-defined with fallthrough.
+__device__ int switch_fallthrough(int cond) {
+  int i = 0;
+  switch (cond) {
+  case 10:
+    foo();
+    i++;
+    break;
+  case 20:
+    foo();
+    i += 2;
+  case 30: {
+    foo();
+    i += 3;
+    i += 4;
+    break;
+  }
+  default:
+    foo();
+    break;
+  }
+  return i;
+}
+
+// Convergence is well-defined for forward jumps across a conditional 
statement.
+__device__ int forward_conditional(int cond) {
+  int i = 0;
+
+  if (cond > 0) {
+    foo();
+    i++;
+    goto jumptarget;
+  } else {
+  jumptarget:
+    foo();
+    i++;
+  }
+
+  return i;
+}
+
+// Convergence is implementation defined for a backwards jump across a
+// conditional statement.
+__device__ int backwards_conditional(int cond) {
+  int i = 0;
+
+  foo();
+  if (cond > 0) {
+    foo();
+  jumptarget:
+    // expected-warning@-1 {{convergence is implementation-defined due to a 
backwards goto}}
+    // expected-note@#GOTO341 {{jump from this goto statement affects 
convergence}}
+    i++;
+  } else {
+    foo();
+    i++;
+    goto jumptarget; // #GOTO341
+  }
+
+  return i;
+}
+
+__device__ int duffs_device(int count) {
+  int i = 0;
+  int n = (count + 7) / 8;
+    switch (count % 8) {
+    case 0: do { i++; // #LOOP351
+        foo();
+    case 7:      i++;
+        // expected-warning@#LOOP351 {{jump enters an iteration statement; 
convergence is implementation-defined}}
+        // expected-note@-2 {{jump to this case statement affects convergence 
of loop}}
+      foo();
+    case 6:      i++;
+        // expected-warning@#LOOP351 {{jump enters an iteration statement; 
convergence is implementation-defined}}
+        // expected-note@-2 {{jump to this case statement affects convergence 
of loop}}
+    case 5:      i++;
+        // expected-warning@#LOOP351 {{jump enters an iteration statement; 
convergence is implementation-defined}}
+        // expected-note@-2 {{jump to this case statement affects convergence 
of loop}}
+    case 4:      i++;
+        // expected-warning@#LOOP351 {{jump enters an iteration statement; 
convergence is implementation-defined}}
+        // expected-note@-2 {{jump to this case statement affects convergence 
of loop}}
+    case 3:      i++;
+        // expected-warning@#LOOP351 {{jump enters an iteration statement; 
convergence is implementation-defined}}
+        // expected-note@-2 {{jump to this case statement affects convergence 
of loop}}
+    case 2:      i++;
+        // expected-warning@#LOOP351 {{jump enters an iteration statement; 
convergence is implementation-defined}}
+        // expected-note@-2 {{jump to this case statement affects convergence 
of loop}}
+    case 1:      i++;
+        // expected-warning@#LOOP351 {{jump enters an iteration statement; 
convergence is implementation-defined}}
+        // expected-note@-2 {{jump to this case statement affects convergence 
of loop}}
+            } while (--n > 0);
+    }
+    return i;
+}

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang] Define convergence in C++ languages such as HIP, CUDA, OpenCL (PR #136280)

Reply via email to