https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111246
Bug ID: 111246 Summary: PPC64 Sequentially Consistent Load allows Reordering of Stores Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: luke.geeson at cs dot ucl.ac.uk Target Milestone: --- Consider the following litmus test that captures bad behaviour: ``` C test { *x = 0; *y = 0; } // fixed initial state where x and y are 0 // Concurrent threads P0 (atomic_int* y,atomic_int* x) { atomic_store_explicit(x,2,memory_order_relaxed); atomic_thread_fence(memory_order_consume); atomic_store_explicit(y,1,memory_order_seq_cst); } P1 (atomic_int* y,atomic_int* x) { int r0 = atomic_load_explicit(y,memory_order_seq_cst); atomic_thread_fence(memory_order_relaxed); atomic_store_explicit(x,1,memory_order_relaxed); } exists ([x]=2 /\ 1:r0=1) ``` where 'P0:r0 = 0' means thread P0, local variable r0 has value 0. [x] means x is a global. When simulating this test under the C/C++ model from its initial state, the outcome of execution in the exists clause is forbidden by the source model. The allowed outcomes are: ``` { P1:r0=0; [x]=1; } { P1:r0=0; [x]=2; } { P1:r0=1; [x]=1; } ``` When compiling to target PPC64le with GCC trunk `-c -g -O1 -pthread --std=c11 -ffreestanding -mabi=elfv1` (https://godbolt.org/z/xj9W7nr9G), the compiled program has the following outcomes when simulated under the PPC model: ``` { P1:r0=0; [x]=1; } { P1:r0=0; [x]=2; } { P1:r0=1; [x]=1; } { P1:r0=1; [x]=2; } <--- forbidden by source model, bug! ``` which occurs since the `b L0x50; isync` pattern allows the memory effect of the load of y on thread P1 to be reordered after the store to x on P1. We observe the execution: { P1:r0=0;[x]=0} -> P1:W[x]=1 -> P0:W[x]=2 -> P0:W[y]=1 -> P1:R[y]=0 -> { P1:r0=1; [x]=2; } where the local variable `r0` is stored in register r9 in the compiled code. The problem is isync forces the instructions to finish, but not their memory effects. Hence the forbidden outcome { P1:r0=1; [x]=2; } is allowed under the ppc model. ``` PPC test { [P1_r0]=0;[x]=0;[y]=0; uint64_t %P0_x=x; uint64_t %P0_y=y; uint64_t %P1_P1_r0=P1_r0;uint64_t %P1_x=x; uint64_t %P1_y=y } (*****************************************************************) (* the Telechat toolsuite *) (* *) (* Luke Geeson, University College London, UK. *) (* *) (*****************************************************************) P0 | P1 ; li r10,2 | sync ; stw r10,0(%P0_x) | lwz r9,0(%P1_y) ; lwsync | cmpw r9,r9 ; sync | b L0x50 ; li r10,1 | L0x50: isync ; stw r10,0(%P0_y) | li r8,1 ; | stw r8,0(%P1_x) ; | stw r9,0(%P1_P1_r0) ; exists ([x]=2 /\ P1_r0=1) // YES under PPC model ``` We have observed this behaviour with several hundred Message Passing and 'S' pattern tests (the above is an S pattern test). To fix this, we advise replacing the isync with lwfence to forbid the buggy behaviour: ``` PPC test-fixed { [P1_r0]=0;[x]=0;[y]=0; uint64_t %P0_x=x; uint64_t %P0_y=y; uint64_t %P1_P1_r0=P1_r0;uint64_t %P1_x=x; uint64_t %P1_y=y } (*****************************************************************) (* the Telechat toolsuite *) (* *) (* Luke Geeson, University College London, UK. *) (* *) (*****************************************************************) P0 | P1 ; li r10,2 | sync ; stw r10,0(%P0_x) | lwz r9,0(%P1_y) ; lwsync | cmpw r9,r9 ; sync | b L0x50 ; li r10,1 | L0x50: lwfence ; stw r10,0(%P0_y) | li r8,1 ; | stw r8,0(%P1_x) ; | stw r9,0(%P1_P1_r0) ; exists ([x]=2 /\ P1_r0=1) // NO under PPC model ``` which no longer allows the buggy outcome under the ppc model: ``` { P1:r0=0; [x]=1; } { P1:r0=0; [x]=2; } { P1:r0=1; [x]=1; } ``` I'm happy to discuss this as needed.