Hi,

we've been having hanging OpenMP tests for nvptx offloading: for-{3,5,6}.c and the corresponding C++ test-cases.

The failures have now been analyzed down to gomp_ptrlock_get in libgomp/config/nvptx/ptrlock.h:
...
 static inline void *gomp_ptrlock_get (gomp_ptrlock_t *ptrlock)
{
  uintptr_t v = (uintptr_t) __atomic_load_n (ptrlock, MEMMODEL_ACQUIRE);
  if (v > 2)
    return (void *) v;

  if (v == 0
      && __atomic_compare_exchange_n (ptrlock, &v, 1, false,
                                      MEMMODEL_ACQUIRE,
                                      MEMMODEL_ACQUIRE))
    return NULL;

  while (v == 1)
    v = (uintptr_t) __atomic_load_n (ptrlock, MEMMODEL_ACQUIRE);

  return (void *) v;
}
...

There's no atomic load insn defined for nvptx, and also no memory barrier insn, so the atomic load ends up generating a normal load. The JIT compiler does loop-invariant code motion, and moves the load out of the loop, which turns the while into an eternal loop.


Fix conservatively by defining the memory_barrier insn. This can possibly be fixed more optimally by implementing an atomic load operation in nvptx.

Build x86_64 with nvptx accelerator and reg-tested libgomp.

Committed to stage4 trunk.

Thanks,
- Tom
[nvptx] Add memory_barrier insn

2018-04-09  Tom de Vries  <t...@codesourcery.com>

	PR target/84041
	* config/nvptx/nvptx.md (define_c_enum "unspecv"): Add UNSPECV_MEMBAR.
	(define_expand "*memory_barrier"): New define_expand.
	(define_insn "memory_barrier"): New insn.

---
 gcc/config/nvptx/nvptx.md | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 4f4453d..68bba36 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -55,6 +55,7 @@
    UNSPECV_CAS
    UNSPECV_XCHG
    UNSPECV_BARSYNC
+   UNSPECV_MEMBAR
    UNSPECV_DIM_POS
 
    UNSPECV_FORK
@@ -1459,6 +1460,27 @@
   "\\tbar.sync\\t%0;"
   [(set_attr "predicable" "false")])
 
+(define_expand "memory_barrier"
+  [(set (match_dup 0)
+	(unspec_volatile:BLK [(match_dup 0)] UNSPECV_MEMBAR))]
+  ""
+{
+  operands[0] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode));
+  MEM_VOLATILE_P (operands[0]) = 1;
+})
+
+;; Ptx defines the memory barriers membar.cta, membar.gl and membar.sys
+;; (corresponding to cuda functions threadfence_block, threadfence and
+;; threadfence_system).  For the insn memory_barrier we use membar.sys.  This
+;; may be overconservative, but before using membar.gl instead we'll need to
+;; explain in detail why it's safe to use.  For now, use membar.sys.
+(define_insn "*memory_barrier"
+  [(set (match_operand:BLK 0 "" "")
+	(unspec_volatile:BLK [(match_dup 0)] UNSPECV_MEMBAR))]
+  ""
+  "\\tmembar.sys;"
+  [(set_attr "predicable" "false")])
+
 (define_insn "nvptx_nounroll"
   [(unspec_volatile [(const_int 0)] UNSPECV_NOUNROLL)]
   ""

Reply via email to