On 11/02/15 14:41, Jakub Jelinek wrote:
Does this work even with -O0? I mean, the assembler is invalid for any target other than PTX, so you are relying on aggressively folding this away.
Correct. As thread identification is inherently target-specific, I don't see how to do otherwise.
We know _builtin_acc_on_device is folded for a constant arg, and -O2 enables dead code elimination such that non-PTX targets (such as the host) don't see that assembler.
nathan