https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111246

            Bug ID: 111246
           Summary: PPC64 Sequentially Consistent Load allows Reordering
                    of Stores
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: luke.geeson at cs dot ucl.ac.uk
  Target Milestone: ---

Consider the following litmus test that captures bad behaviour:

```
C test

{ *x = 0; *y = 0; } // fixed initial state where x and y are 0

// Concurrent threads
P0 (atomic_int* y,atomic_int* x) {
  atomic_store_explicit(x,2,memory_order_relaxed);
  atomic_thread_fence(memory_order_consume);
  atomic_store_explicit(y,1,memory_order_seq_cst);
}

P1 (atomic_int* y,atomic_int* x) {
  int r0 = atomic_load_explicit(y,memory_order_seq_cst);
  atomic_thread_fence(memory_order_relaxed);
  atomic_store_explicit(x,1,memory_order_relaxed);
}

exists ([x]=2 /\ 1:r0=1)

```
where 'P0:r0 = 0' means thread P0, local variable r0 has value 0. [x] means x
is a global.

When simulating this test under the C/C++ model from its initial state, the
outcome of execution in the exists clause is forbidden by the source model. The
allowed outcomes are:
```
{ P1:r0=0; [x]=1; }
{ P1:r0=0; [x]=2; }
{ P1:r0=1; [x]=1; }
```
When compiling to target PPC64le with GCC trunk `-c -g -O1 -pthread --std=c11
-ffreestanding -mabi=elfv1` (https://godbolt.org/z/xj9W7nr9G), the compiled
program has the following outcomes when simulated under the PPC model:
```
{ P1:r0=0; [x]=1; }                           
{ P1:r0=0; [x]=2; }                                                             
{ P1:r0=1; [x]=1; }    
{ P1:r0=1; [x]=2; } <--- forbidden by source model, bug!
```
which occurs since the `b L0x50; isync` pattern allows the memory effect of the
load of y on thread P1 to be reordered after the store to x on P1. We observe
the execution:

{ P1:r0=0;[x]=0} -> P1:W[x]=1 -> P0:W[x]=2 -> P0:W[y]=1 -> P1:R[y]=0 -> {
P1:r0=1; [x]=2; } 

where the local variable `r0` is stored in register r9 in the compiled code.

The problem is isync forces the instructions to finish, but not their memory
effects. Hence the forbidden outcome { P1:r0=1; [x]=2; } is allowed under the
ppc model.

```
PPC test

{ [P1_r0]=0;[x]=0;[y]=0;
  uint64_t %P0_x=x; uint64_t %P0_y=y;
  uint64_t %P1_P1_r0=P1_r0;uint64_t %P1_x=x;
  uint64_t %P1_y=y }
(*****************************************************************)
(*                 the Telechat toolsuite                        *)
(*                                                               *)
(* Luke Geeson, University College London, UK.                   *)
(*                                                               *)
(*****************************************************************)
  P0                 |  P1                    ;
   li r10,2          |   sync                 ;
   stw r10,0(%P0_x)  |   lwz r9,0(%P1_y)      ;
   lwsync            |   cmpw r9,r9           ;
   sync              |   b   L0x50            ;
   li r10,1          |  L0x50: isync          ;
   stw r10,0(%P0_y)  |   li r8,1              ;
                     |   stw r8,0(%P1_x)      ;
                     |   stw r9,0(%P1_P1_r0)  ;

exists ([x]=2 /\ P1_r0=1)  // YES under PPC model

```
We have observed this behaviour with several hundred Message Passing and 'S'
pattern tests (the above is an S pattern test).

To fix this, we advise replacing the isync with lwfence to forbid the buggy
behaviour:

```
PPC test-fixed

{ [P1_r0]=0;[x]=0;[y]=0;
  uint64_t %P0_x=x; uint64_t %P0_y=y;
  uint64_t %P1_P1_r0=P1_r0;uint64_t %P1_x=x;
  uint64_t %P1_y=y }
(*****************************************************************)
(*                 the Telechat toolsuite                        *)
(*                                                               *)
(* Luke Geeson, University College London, UK.                   *)
(*                                                               *)
(*****************************************************************)
  P0                 |  P1                    ;
   li r10,2          |   sync                 ;
   stw r10,0(%P0_x)  |   lwz r9,0(%P1_y)      ;
   lwsync            |   cmpw r9,r9           ;
   sync              |   b   L0x50            ;
   li r10,1          |  L0x50: lwfence        ;
   stw r10,0(%P0_y)  |   li r8,1              ;
                     |   stw r8,0(%P1_x)      ;
                     |   stw r9,0(%P1_P1_r0)  ;


exists ([x]=2 /\ P1_r0=1)  // NO under PPC model

```
which no longer allows the buggy outcome under the ppc model:
```
{ P1:r0=0; [x]=1; }                           
{ P1:r0=0; [x]=2; }                                                             
{ P1:r0=1; [x]=1; } 
```

I'm happy to discuss this as needed.

Reply via email to