https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83920
--- Comment #10 from cesar at gcc dot gnu.org --- And here is the working code in -O2: { .reg.u32 %x; mov.u32 %x, %tid.x; setp.ne.u32 %r71, %x, 0; } @%r71 bra $L13; mov.u64 %r45, %ar0; mov.u64 %r46, %ar1; mov.u32 %r42, %ctaid.x; shl.b32 %r48, %r42, 2; add.u32 %r37, %r48, %r42; mov.u32 %r31, 5; setp.ne.u64 %r64, %r46, 1; mov.u32 %r66, 0; $L13: $L3: mov.pred %r74, %r64; setp.eq.u32 %r64, 1, 0; @%r71 bra $L12; $L12: mov.pred %r64, %r74; selp.u32 %r75, 1, 0, %r64; shfl.idx.b32 %r75, %r75, 0, 31; setp.ne.u32 %r64, %r75, 0; @%r64 bra.uni $L2; $L6: Notice how gcse's PRE pass hoisted the initialization of %r64 early in the entry block. I think we should go with my patch. If the register is live, it shouldn't require your workaround.