https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104783
--- Comment #2 from Tom de Vries <vries at gcc dot gnu.org> --- Hmm, the atom insn sets a register that is not used anywhere. So the shuffle communicating the result doesn't make much sense. We can fix that by doing: ... diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc index c6cec0c27c2..60d02c02452 100644 --- a/gcc/config/nvptx/nvptx.cc +++ b/gcc/config/nvptx/nvptx.cc @@ -3265,7 +3265,9 @@ static bool nvptx_unisimt_handle_set (rtx set, rtx_insn *insn, rtx master) { rtx reg; - if (GET_CODE (set) == SET && REG_P (reg = SET_DEST (set))) + if (GET_CODE (set) == SET + && REG_P (reg = SET_DEST (set)) + && find_reg_note (insn, REG_UNUSED, reg) == NULL_RTX) { emit_insn_after (nvptx_gen_shuffle (reg, reg, master, SHUFFLE_IDX), insn); ... But that gives us a warp sync instead of a shuffle: ... $L2: ld.u64 %r29,[%r27]; @ %r33 atom.add.u32 %r30,[%r29],1; bar.warp.sync 0xffffffff; ... so the problem of the hang persists. But, if we roll back the recent change of commit 8e5c34ab45f ("[nvptx] Use nvptx_warpsync / nvptx_uniform_warp_check for -muniform-simt", the test-case passes.