On Mon, Nov 02, 2015 at 02:23:19PM -0500, Nathan Sidwell wrote:
> +#pragma acc routine gang
> +void __attribute__ ((noinline)) gang (int ary[N])
> +{
> +#pragma acc loop gang
> +    for (unsigned ix = 0; ix < N; ix++)
> +      {
> +     if (__builtin_acc_on_device (5))
> +       {
> +         int g = 0, w = 0, v = 0;
> +
> +         __asm__ volatile ("mov.u32 %0,%%ctaid.x;" : "=r" (g));
> +         __asm__ volatile ("mov.u32 %0,%%tid.y;" : "=r" (w));
> +         __asm__ volatile ("mov.u32 %0,%%tid.x;" : "=r" (v));
> +         ary[ix] = (g << 16) | (w << 8) | v;
> +       }
> +     else
> +       ary[ix] = ix;

Does this work even with -O0?  I mean, the assembler is invalid
for any target other than PTX, so you are relying on aggressively folding
this away.

        Jakub

Reply via email to