https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85204
Bug ID: 85204
Summary: [nvptx] infinite loop generated
Product: gcc
Version: 8.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: vries at gcc dot gnu.org
Target Milestone: ---
Consider the following testcase (broadcast-1.c in the og7 branch):
...
#include <assert.h>
#include <math.h>
#define N 1024
int A[N][N] ;
void test(int x)
{
#pragma acc parallel num_gangs(16) num_workers(4) vector_length(32) copyout(A)
{
#pragma acc loop gang
for(int j=0;j<N;j++)
{
if (x==1)
{
#pragma acc loop worker vector
for(int i=0;i<N;i++)
A[i][j] = 1;
}
else
{
#pragma acc loop worker vector
for(int i=0;i<N;i++)
A[i][j] = -1;
}
}
}
}
int main(void)
{
test (0);
for (int i = 0; i < N; i++)
for (int j = 0; j < N; j++)
assert (A[i][j] == -1);
test (1);
for (int i = 0; i < N; i++)
for (int j = 0; j < N; j++)
assert (A[i][j] == 1);
return 0;
}
...
At -O2, the backend generates:
...
{
.reg .u32 %y;
mov.u32 %y,%tid.y;
setp.ne.u32 %r91,%y,0;
}
{
.reg .u32 %x;
mov.u32 %x,%tid.x;
setp.ne.u32 %r92,%x,0;
}
...
$L4:
@ %r91 bra.uni $L24;
selp.u32 %r95,1,0,%r80;
st.shared.u32 [__worker_bcast],%r95;
$L25:
$L24:
@ %r92 bra $L25;
...
Note the eternal loop at the branch to $L25.
Not surprisingly, the testcase hangs.
This looks like neutering gone wrong, probably the jump "@ %r92 bra $L25" is a
vector neutering jump and should be placed after the worker neutering jump "@
%r91 bra.uni $L24".
The failure was reported here:
https://gcc.gnu.org/ml/gcc-patches/2016-10/msg02187.html (though the root cause
there was mis-analyzed, and the proposed patch incorrect because it introduces
a diverging bra.uni).