On 29/06/2020 21:16, Julian Brown wrote:
Data-share write (ds_write) instructions do not necessarily complete
the write to LDS immediately. When a write completes, LGKM_CNT is
decremented. For now, we wait until LGKM_CNT reaches zero after each
ds_write instruction.
This fixes a race condition in the case where LDS is read immediately
after being written. This can happen with broadcast operations.
OK for og10 branch?
I'm not saying no (because this issue needs a fix), but the thought
occurs that inserting one wait before the barrier might be better than
inserting a wait after each and every write.
In particular, it seems logical that any barrier should be a memory
barrier, so inserting it in the barrier pattern is not a big deal. IIRC,
only OpenACC is using that anyway (OpenMP has explicit asm inserts in
libgomp).
WDYT?
Andrew