https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83589
--- Comment #3 from Tom de Vries <vries at gcc dot gnu.org> --- (In reply to Tom de Vries from comment #2) > This looks like another nvidia driver problem (with driver version 384.111). Confirmed. The empty branch > @ %r36 bra $L5; > $L5: is translated into: ... /*0128*/ @P0 BRA `(.L_1); .L_1: ... so, no sync after the branch (or ssy before the branch). Consequently, when executing the shfl.idx a bit later: ... /*0158*/ SHFL.IDX PT, R0, R0, RZ, 0x1f; /*0168*/ SHFL.IDX PT, R2, R2, RZ, 0x1f; ... we are in divergent mode and get undefined results. Inserting some sort of nop in the branched-around part: ... @ %r36 bra $L5; { .reg .u32 %nop_src; .reg .u32 %nop_dst; mov.u32 %nop_dst, %nop_src; } $L5: ... makes the test pass, because then we generate: ... /*0128*/ SSY `(.L_1); /*0130*/ @P0 SYNC (*"TARGET= .L_1 "*); /*0138*/ SYNC (*"TARGET= .L_1 "*); .L_1: ...