This patch changes the default for gang partitioned size to be determined at
runtime (and thus interrogate the hardware).
The auto-loop test is designed for num_gangs 32. The fortran nested function
test appears to also require that, but my be hiding another defect. Its use of
gang(static:1) seems a little strange and not justified by the comments
discussing its use.
nathan
2016-08-28 Nathan Sidwell <nat...@codesourcery.com>
gcc/
* config/nvptx/nvptx.c (PTX_GANG_DEFAULT): Set to zero.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c: Set gang
dimension.
* testsuite/libgomp.oacc-fortran/nested-function-1.f90: Likewise.
Index: gcc/config/nvptx/nvptx.c
===================================================================
--- gcc/config/nvptx/nvptx.c (revision 239868)
+++ gcc/config/nvptx/nvptx.c (working copy)
@@ -4157,7 +4157,7 @@ nvptx_expand_builtin (tree exp, rtx targ
/* Define dimension sizes for known hardware. */
#define PTX_VECTOR_LENGTH 32
#define PTX_WORKER_LENGTH 32
-#define PTX_GANG_DEFAULT 32
+#define PTX_GANG_DEFAULT 0 /* Defer to runtime. */
/* Validate compute dimensions of an OpenACC offload or routine, fill
in non-unity defaults. FN_LEVEL indicates the level at which a
Index: libgomp/testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c
===================================================================
--- libgomp/testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c (revision 239868)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c (working copy)
@@ -2,6 +2,8 @@
not optimized away at -O0, and then confuses the target assembler.
{ dg-skip-if "" { *-*-* } { "-O0" } { "" } } */
+/* { dg-additional-options "-fopenacc-dim=32" } */
+
#include <stdio.h>
#include <openacc.h>
@@ -151,8 +153,7 @@ int gang_1 (int *ary, int size)
{
clear (ary, size);
-#pragma acc parallel num_gangs (32) num_workers (32) vector_length(32) copy(ary[0:size]) firstprivate (size)
- /* { dg-warning "region is vector partitioned but does not contain vector partitioned code" "vector" { target *-*-* } 154 } */
+#pragma acc parallel num_gangs (32) num_workers (32) vector_length(32) copy(ary[0:size]) firstprivate (size)/* { dg-warning "region is vector partitioned but does not contain vector partitioned code" "vector" { target *-*-* } } */
{
#pragma acc loop auto
for (int jx = 0; jx < size / 64; jx++)
Index: libgomp/testsuite/libgomp.oacc-fortran/nested-function-1.f90
===================================================================
--- libgomp/testsuite/libgomp.oacc-fortran/nested-function-1.f90 (revision 239868)
+++ libgomp/testsuite/libgomp.oacc-fortran/nested-function-1.f90 (working copy)
@@ -1,6 +1,7 @@
! Exercise nested function decomposition, gcc/tree-nested.c.
! { dg-do run }
+! { dg-additional-options "-fopenacc-dim=32" }
program collapse2
call test1