Hello, I'm testing 'target teams' implementation for gomp-nvptx branch, and I'm seeing test failures that, I think, are caused by faulty tests (unlike NVPTX, MIC offloading uses 1 team similar to native execution, so the issues are not exposed there).
First, there's libgomp.c/target-31.c where the following triggers: 39 if (c != 3 || d != 4 || g[0] != 9 || g[1] != 10 || h[0] != 11 || h[1] != 12 || k != 14 || m[0] != 17 || m[1] != 18) 40 #pragma omp atomic write 41 err = 1; The first failing clause is 'g[0] != 9'; g[0] is actually 11 after being incremented at line 57 in another team. If I remove ' || g[0] != 9 || g[1] != 10' from line 39 the test passes. Second, there are failures on libgomp.c/examples-4/teams-{3,4}.c and their Fortran counterparts. The issue is that 'sum' is not reduced across all teams, but only across loop iterations within each team. I'm using the following patch to add the missing reduction. Is that correct? (apart from these I don't see other test regressions) Thanks. Alexander diff --git a/libgomp/testsuite/libgomp.c/examples-4/teams-3.c b/libgomp/testsuite/libgomp.c/examples-4/teams-3.c [2/222] index 5fe63a6..3765bab 100644 --- a/libgomp/testsuite/libgomp.c/examples-4/teams-3.c +++ b/libgomp/testsuite/libgomp.c/examples-4/teams-3.c @@ -31,7 +31,8 @@ float dotprod (float B[], float C[], int n) int i; float sum = 0; - #pragma omp target teams map(to: B[0:n], C[0:n]) map(tofrom: sum) + #pragma omp target teams map(to: B[0:n], C[0:n]) map(tofrom: sum) \ + reduction(+:sum) #pragma omp distribute parallel for reduction(+:sum) for (i = 0; i < n; i++) sum += B[i] * C[i]; diff --git a/libgomp/testsuite/libgomp.c/examples-4/teams-4.c b/libgomp/testsuite/libgomp.c/examples-4/teams-4.c index 6136eab..d0c586c 100644 --- a/libgomp/testsuite/libgomp.c/examples-4/teams-4.c +++ b/libgomp/testsuite/libgomp.c/examples-4/teams-4.c @@ -32,7 +32,7 @@ float dotprod (float B[], float C[], int n) float sum = 0; #pragma omp target map(to: B[0:n], C[0:n]) map(tofrom:sum) - #pragma omp teams num_teams(8) thread_limit(16) + #pragma omp teams num_teams(8) thread_limit(16) reduction(+:sum) #pragma omp distribute parallel for reduction(+:sum) \ dist_schedule(static, 1024) \ schedule(static, 64) diff --git a/libgomp/testsuite/libgomp.fortran/examples-4/teams-3.f90 b/libgomp/testsuite/libgomp.fortran/examples-4/teams-3.f90 index 2588d8b..aca57ee 100644 --- a/libgomp/testsuite/libgomp.fortran/examples-4/teams-3.f90 +++ b/libgomp/testsuite/libgomp.fortran/examples-4/teams-3.f90 @@ -14,7 +14,7 @@ function dotprod (B, C, N) result(sum) real :: B(N), C(N), sum integer :: N, i sum = 0.0e0 - !$omp target teams map(to: B, C) + !$omp target teams map(to: B, C) reduction(+:sum) !$omp distribute parallel do reduction(+:sum) do i = 1, N sum = sum + B(i) * C(i) diff --git a/libgomp/testsuite/libgomp.fortran/examples-4/teams-4.f90 b/libgomp/testsuite/libgomp.fortran/examples-4/teams-4.f90 index efae3c3..f214315 100644 --- a/libgomp/testsuite/libgomp.fortran/examples-4/teams-4.f90 +++ b/libgomp/testsuite/libgomp.fortran/examples-4/teams-4.f90 @@ -15,7 +15,7 @@ function dotprod (B, C, n) result(sum) integer :: N, i sum = 0.0e0 !$omp target map(to: B, C) - !$omp teams num_teams(8) thread_limit(16) + !$omp teams num_teams(8) thread_limit(16) reduction(+:sum) !$omp distribute parallel do reduction(+:sum) & !$omp& dist_schedule(static, 1024) schedule(static, 64) do i = 1, N