Hi Cesar! On Fri, 27 Jan 2017 07:45:52 -0800, Cesar Philippidis <ce...@codesourcery.com> wrote: > If you take a close look at lower_omp_target, you'll notice that I'm > gave reference types special treatment. Specifically, I disabled this > optimization on non-INTEGER_TYPE and floating point values, because the > nvptx target was having some problems dereferencing boolean-typed > pointers. That's something I have on my TODO list to track down later.
Please file an issue as appropriate. > As for the performance gains, this optimization resulted in a > non-trivial speedup in CloverLeaf running on a Nvidia Pascal board. > CloverLeaf is somewhat special in that it consists of a lot of OpenACC > offloaded regions which gets called multiple times throughout its > execution. Consequently, it is I/O limited. The other benchmarks I ran > didn't benefit nearly as much as CloverLeaf. I chose a small data set > for CloverLeaf that only ran in 1.3s without the patch, and hence make > it even more I/O limited. After the patch, it ran 0.35s faster. \o/ Yay! > This patch has been applied to gomp-4_0-branch. (Not reviewed in detail.) > --- a/gcc/omp-low.c > +++ b/gcc/omp-low.c > +static tree > +convert_from_firstprivate_pointer (tree var, bool is_ref, gimple_seq *gs) > +{ > + tree type = TREE_TYPE (var); > + tree new_type = NULL_TREE; > + tree tmp = NULL_TREE; > + tree inner_type = NULL_TREE; [...]/source-gcc/gcc/omp-low.c: In function 'tree_node* convert_from_firstprivate_pointer(tree, bool, gimple**)': [...]/source-gcc/gcc/omp-low.c:16515:8: warning: unused variable 'inner_type' [-Wunused-variable] > --- /dev/null > +++ b/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90 I see: {+FAIL: libgomp.oacc-fortran/firstprivate-int.f90 -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O (internal compiler error)+} {+FAIL: libgomp.oacc-fortran/firstprivate-int.f90 -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O 4 blank line(s) in output+} {+FAIL: libgomp.oacc-fortran/firstprivate-int.f90 -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O (test for excess errors)+} {+UNRESOLVED: libgomp.oacc-fortran/firstprivate-int.f90 -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O compilation failed to produce executable+} That's the nvptx offloading compiler configured with "--enable-checking=yes,df,fold,rtl": [...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90: In function 'MAIN__._omp_fn.1': [...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:55:0: error: conversion of register to a different size VIEW_CONVERT_EXPR<logical(kind=2)>(_17); _18 = VIEW_CONVERT_EXPR<logical(kind=2)>(_17); [...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:55:0: error: conversion of register to a different size VIEW_CONVERT_EXPR<logical(kind=4)>(_20); _21 = VIEW_CONVERT_EXPR<logical(kind=4)>(_20); [...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:55:0: error: conversion of register to a different size VIEW_CONVERT_EXPR<logical(kind=8)>(_23); _24 = VIEW_CONVERT_EXPR<logical(kind=8)>(_23); [...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:55:0: error: conversion of register to a different size VIEW_CONVERT_EXPR<logical(kind=16)>(_26); _27 = VIEW_CONVERT_EXPR<logical(kind=16)>(_26); [...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:55:0: internal compiler error: verify_gimple failed 0xa67d75 verify_gimple_in_cfg(function*, bool) [...]/source-gcc/gcc/tree-cfg.c:5125 0x94ebbc execute_function_todo [...]/source-gcc/gcc/passes.c:1958 0x94f513 execute_todo [...]/source-gcc/gcc/passes.c:2010 And with "-m32" multilib testing, I see: {+FAIL: libgomp.oacc-fortran/firstprivate-int.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O (test for excess errors)+} {+UNRESOLVED: libgomp.oacc-fortran/firstprivate-int.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O compilation failed to produce executable+} That is: [...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:10:18: Error: Kind 16 not supported for type INTEGER at (1) [...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:16:18: Error: Kind 16 not supported for type LOGICAL at (1) [...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:115:18: Error: Kind 16 not supported for type INTEGER at (1) [...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:121:18: Error: Kind 16 not supported for type LOGICAL at (1) [...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:31:6: Error: Symbol 'i16i' at (1) has no IMPLICIT type [...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:49:40: Error: Symbol 'i16o' at (1) has no IMPLICIT type [...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:37:6: Error: Symbol 'l16i' at (1) has no IMPLICIT type [...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:51:40: Error: Symbol 'l16o' at (1) has no IMPLICIT type [...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:105:43: Error: Symbol 'i16i' at (1) has no IMPLICIT type [...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:105:69: Error: Symbol 'i16o' at (1) has no IMPLICIT type [...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:106:43: Error: Symbol 'l16i' at (1) has no IMPLICIT type [...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:106:69: Error: Symbol 'l16o' at (1) has no IMPLICIT type Grüße Thomas > @@ -0,0 +1,203 @@ > +! Verify the GOMP_MAP_FIRSTPRIVATE_INT optimziation on various types. > + > +program test > + implicit none > + > + integer (kind=1) :: i1i, i1o > + integer (kind=2) :: i2i, i2o > + integer (kind=4) :: i4i, i4o > + integer (kind=8) :: i8i, i8o > + integer (kind=16) :: i16i, i16o > + > + logical (kind=1) :: l1i, l1o > + logical (kind=2) :: l2i, l2o > + logical (kind=4) :: l4i, l4o > + logical (kind=8) :: l8i, l8o > + logical (kind=16) :: l16i, l16o > + > + real (kind=4) :: r4i, r4o > + real (kind=8) :: r8i, r8o > + > + complex (kind=4) :: c4i, c4o > + complex (kind=8) :: c8i, c8o > + > + character (kind=1) :: ch1i, ch1o > + character (kind=4) :: ch4i, ch4o > + > + i1i = 1 > + i2i = 2 > + i4i = 3 > + i8i = 4 > + i16i = 5 > + > + l1i = .true. > + l2i = .false. > + l4i = .true. > + l8i = .true. > + l16i = .false. > + > + r4i = .5 > + r8i = .25 > + > + c4i = (2, -2) > + c8i = (4, -4) > + > + ch1i = "a" > + ch4i = "b" > + > + !$acc parallel firstprivate(i1i, i2i, i4i, i8i, i16i) & > + !$acc copyout(i1o, i2o, i4o, i8o, i16o) & > + !$acc firstprivate(l1i, l2i, l4i, l8i, l16i) & > + !$acc copyout(l1o, l2o, l4o, l8o, l16o) & > + !$acc firstprivate(r4i, r8i) copyout(r4o, r8o) & > + !$acc firstprivate(c4i, c8i) copyout(c4o, c8o) & > + !$acc firstprivate(ch1i, ch4i) & > + !$acc copyout(ch1o, ch4o) > + i1o = i1i > + i2o = i2i > + i4o = i4i > + i8o = i8i > + i16o = i16i > + > + l1o = l1i > + l2o = l2i > + l4o = l4i > + l8o = l8i > + l16o = l16i > + > + r4o = r4i > + r8o = r8i > + > + c4o = c4i > + c8o = c8i > + > + ch1o = ch1i > + ch4o = ch4i > + !$acc end parallel > + > + if (i1i /= i1o) call abort > + if (i2i /= i2o) call abort > + if (i4i /= i4o) call abort > + if (i8i /= i8o) call abort > + if (i16i /= i16o) call abort > + > + if (l1i .neqv. l1o) call abort > + if (l2i .neqv. l2o) call abort > + if (l4i .neqv. l4o) call abort > + if (l8i .neqv. l8o) call abort > + if (l16i .neqv. l16o) call abort > + > + if (r4i /= r4o) call abort > + if (r8i /= r8o) call abort > + > + if (c4i /= c4o) call abort > + if (c8i /= c8o) call abort > + > + if (ch1i /= ch1o) call abort > + if (ch4i /= ch4o) call abort > + > + call subtest(i1i, i2i, i4i, i8i, i16i, i1o, i2o, i4o, i8o, i16o, & > + l1i, l2i, l4i, l8i, l16i, l1o, l2o, l4o, l8o, l16o, & > + r4i, r8i, r4o, r8o, c4i, c8i, c4o, c8o, & > + ch1i, ch4i, ch1o, ch4o) > +end program test > + > +subroutine subtest(i1i, i2i, i4i, i8i, i16i, i1o, i2o, i4o, i8o, i16o, & > + l1i, l2i, l4i, l8i, l16i, l1o, l2o, l4o, l8o, l16o, & > + r4i, r8i, r4o, r8o, c4i, c8i, c4o, c8o, & > + ch1i, ch4i, ch1o, ch4o) > + implicit none > + > + integer (kind=1) :: i1i, i1o > + integer (kind=2) :: i2i, i2o > + integer (kind=4) :: i4i, i4o > + integer (kind=8) :: i8i, i8o > + integer (kind=16) :: i16i, i16o > + > + logical (kind=1) :: l1i, l1o > + logical (kind=2) :: l2i, l2o > + logical (kind=4) :: l4i, l4o > + logical (kind=8) :: l8i, l8o > + logical (kind=16) :: l16i, l16o > + > + real (kind=4) :: r4i, r4o > + real (kind=8) :: r8i, r8o > + > + complex (kind=4) :: c4i, c4o > + complex (kind=8) :: c8i, c8o > + > + character (kind=1) :: ch1i, ch1o > + character (kind=4) :: ch4i, ch4o > + > + i1i = -i1i > + i2i = -i2i > + i4i = -i4i > + i8i = -i8i > + i16i = -i16i > + > + l1i = .not. l1i > + l2i = .not. l2i > + l4i = .not. l4i > + l8i = .not. l8i > + l16i = .not. l16i > + > + r4i = -r4i > + r8i = -r8i > + > + c4i = -c4i > + c8i = -c8i > + > + ch1i = "z" > + ch4i = "y" > + > + !$acc parallel firstprivate(i1i, i2i, i4i, i8i, i16i) & > + !$acc copyout(i1o, i2o, i4o, i8o, i16o) & > + !$acc firstprivate(l1i, l2i, l4i, l8i, l16i) & > + !$acc copyout(l1o, l2o, l4o, l8o, l16o) & > + !$acc firstprivate(r4i, r8i) copyout(r4o, r8o) & > + !$acc firstprivate(c4i, c8i) copyout(c4o, c8o) & > + !$acc firstprivate(ch1i, ch4i) & > + !$acc copyout(ch1o, ch4o) > + i1o = i1i > + i2o = i2i > + i4o = i4i > + i8o = i8i > + i16o = i16i > + > + l1o = l1i > + l2o = l2i > + l4o = l4i > + l8o = l8i > + l16o = l16i > + > + r4o = r4i > + r8o = r8i > + > + c4o = c4i > + c8o = c8i > + > + ch1o = ch1i > + ch4o = ch4i > + !$acc end parallel > + > + if (i1i /= i1o) call abort > + if (i2i /= i2o) call abort > + if (i4i /= i4o) call abort > + if (i8i /= i8o) call abort > + if (i16i /= i16o) call abort > + > + if (l1i .neqv. l1o) call abort > + if (l2i .neqv. l2o) call abort > + if (l4i .neqv. l4o) call abort > + if (l8i .neqv. l8o) call abort > + if (l16i .neqv. l16o) call abort > + > + if (r4i /= r4o) call abort > + if (r8i /= r8o) call abort > + > + if (c4i /= c4o) call abort > + if (c8i /= c8o) call abort > + > + if (ch1i /= ch1o) call abort > + if (ch4i /= ch4o) call abort > +end subroutine subtest