Hi,
we've been having hanging OpenMP tests for nvptx offloading:
for-{3,5,6}.c and the corresponding C++ test-cases.
The failures have now been analyzed down to gomp_ptrlock_get in
libgomp/config/nvptx/ptrlock.h:
...
static inline void *gomp_ptrlock_get (gomp_ptrlock_t *ptrlock)
{
uintptr_t v = (uintptr_t) __atomic_load_n (ptrlock, MEMMODEL_ACQUIRE);
if (v > 2)
return (void *) v;
if (v == 0
&& __atomic_compare_exchange_n (ptrlock, &v, 1, false,
MEMMODEL_ACQUIRE,
MEMMODEL_ACQUIRE))
return NULL;
while (v == 1)
v = (uintptr_t) __atomic_load_n (ptrlock, MEMMODEL_ACQUIRE);
return (void *) v;
}
...
There's no atomic load insn defined for nvptx, and also no memory
barrier insn, so the atomic load ends up generating a normal load. The
JIT compiler does loop-invariant code motion, and moves the load out of
the loop, which turns the while into an eternal loop.
Fix conservatively by defining the memory_barrier insn. This can
possibly be fixed more optimally by implementing an atomic load
operation in nvptx.
Build x86_64 with nvptx accelerator and reg-tested libgomp.
Committed to stage4 trunk.
Thanks,
- Tom
[nvptx] Add memory_barrier insn
2018-04-09 Tom de Vries <t...@codesourcery.com>
PR target/84041
* config/nvptx/nvptx.md (define_c_enum "unspecv"): Add UNSPECV_MEMBAR.
(define_expand "*memory_barrier"): New define_expand.
(define_insn "memory_barrier"): New insn.
---
gcc/config/nvptx/nvptx.md | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 4f4453d..68bba36 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -55,6 +55,7 @@
UNSPECV_CAS
UNSPECV_XCHG
UNSPECV_BARSYNC
+ UNSPECV_MEMBAR
UNSPECV_DIM_POS
UNSPECV_FORK
@@ -1459,6 +1460,27 @@
"\\tbar.sync\\t%0;"
[(set_attr "predicable" "false")])
+(define_expand "memory_barrier"
+ [(set (match_dup 0)
+ (unspec_volatile:BLK [(match_dup 0)] UNSPECV_MEMBAR))]
+ ""
+{
+ operands[0] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode));
+ MEM_VOLATILE_P (operands[0]) = 1;
+})
+
+;; Ptx defines the memory barriers membar.cta, membar.gl and membar.sys
+;; (corresponding to cuda functions threadfence_block, threadfence and
+;; threadfence_system). For the insn memory_barrier we use membar.sys. This
+;; may be overconservative, but before using membar.gl instead we'll need to
+;; explain in detail why it's safe to use. For now, use membar.sys.
+(define_insn "*memory_barrier"
+ [(set (match_operand:BLK 0 "" "")
+ (unspec_volatile:BLK [(match_dup 0)] UNSPECV_MEMBAR))]
+ ""
+ "\\tmembar.sys;"
+ [(set_attr "predicable" "false")])
+
(define_insn "nvptx_nounroll"
[(unspec_volatile [(const_int 0)] UNSPECV_NOUNROLL)]
""