On 03/02/2018 05:55 PM, Cesar Philippidis wrote:
In addition, nvptx_cta_sync and the corresponding nvptx_barsync insn,
have been extended to take a barrier ID and a thread count. The idea
here is to assign one barrier for each logical vector. Worker-single
synchronization is controlled by barrier 0. Therefore, the vector
barrier ID is set to tid.y+1 (because there's one vector unit per
worker) in nvptx_init_oacc_workers and placed into a register stored in
cfun->machine->sync_bar. If no workers are present, then the barrier ID
falls back to 0.

I compiled a worker loop before and after the patch series, and observed this change:
...
@@ -70,7 +71,7 @@
  $L2:
   // joining 2;
  $L5:
-  bar.sync 1;
+  bar.sync 0;
   // join 2;
   ret;
 }
...

AFAICT from your explanation above, that change is intentional.

Changing the code generation scheme for workers is fine, but obviously that should be a minimal, separate patch that we can bisect back to.

Thanks,
- Tom

Reply via email to