On 04/09/2018 03:19 PM, Tom de Vries wrote:
Hi,

we've been having hanging OpenMP tests for nvptx offloading: for-{3,5,6}.c and the corresponding C++ test-cases.

The failures have now been analyzed down to gomp_ptrlock_get in libgomp/config/nvptx/ptrlock.h:
...
  static inline void *gomp_ptrlock_get (gomp_ptrlock_t *ptrlock)
{
   uintptr_t v = (uintptr_t) __atomic_load_n (ptrlock, MEMMODEL_ACQUIRE);
   if (v > 2)
     return (void *) v;

   if (v == 0
       && __atomic_compare_exchange_n (ptrlock, &v, 1, false,
                                       MEMMODEL_ACQUIRE,
                                       MEMMODEL_ACQUIRE))
     return NULL;

   while (v == 1)
     v = (uintptr_t) __atomic_load_n (ptrlock, MEMMODEL_ACQUIRE);

   return (void *) v;
}
...

There's no atomic load insn defined for nvptx, and also no memory barrier insn, so the atomic load ends up generating a normal load. The JIT compiler does loop-invariant code motion, and moves the load out of the loop, which turns the while into an eternal loop.


Fix conservatively by defining the memory_barrier insn. This can possibly be fixed more optimally by implementing an atomic load operation in nvptx.

Build x86_64 with nvptx accelerator and reg-tested libgomp.

Committed to stage4 trunk.


And back-ported to og7 branch.

Thanks,
- Tom

0001-nvptx-Add-memory_barrier-insn.patch


[nvptx] Add memory_barrier insn

2018-04-09  Tom de Vries  <t...@codesourcery.com>

        PR target/84041
        * config/nvptx/nvptx.md (define_c_enum "unspecv"): Add UNSPECV_MEMBAR.
        (define_expand "*memory_barrier"): New define_expand.
        (define_insn "memory_barrier"): New insn.

---
  gcc/config/nvptx/nvptx.md | 22 ++++++++++++++++++++++
  1 file changed, 22 insertions(+)

diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 4f4453d..68bba36 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -55,6 +55,7 @@
     UNSPECV_CAS
     UNSPECV_XCHG
     UNSPECV_BARSYNC
+   UNSPECV_MEMBAR
     UNSPECV_DIM_POS
UNSPECV_FORK
@@ -1459,6 +1460,27 @@
    "\\tbar.sync\\t%0;"
    [(set_attr "predicable" "false")])
+(define_expand "memory_barrier"
+  [(set (match_dup 0)
+       (unspec_volatile:BLK [(match_dup 0)] UNSPECV_MEMBAR))]
+  ""
+{
+  operands[0] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode));
+  MEM_VOLATILE_P (operands[0]) = 1;
+})
+
+;; Ptx defines the memory barriers membar.cta, membar.gl and membar.sys
+;; (corresponding to cuda functions threadfence_block, threadfence and
+;; threadfence_system).  For the insn memory_barrier we use membar.sys.  This
+;; may be overconservative, but before using membar.gl instead we'll need to
+;; explain in detail why it's safe to use.  For now, use membar.sys.
+(define_insn "*memory_barrier"
+  [(set (match_operand:BLK 0 "" "")
+       (unspec_volatile:BLK [(match_dup 0)] UNSPECV_MEMBAR))]
+  ""
+  "\\tmembar.sys;"
+  [(set_attr "predicable" "false")])
+
  (define_insn "nvptx_nounroll"
    [(unspec_volatile [(const_int 0)] UNSPECV_NOUNROLL)]
    ""


Reply via email to