Hi Cesar!

On Fri, 27 Jan 2017 07:45:52 -0800, Cesar Philippidis <ce...@codesourcery.com> 
wrote:
> If you take a close look at lower_omp_target, you'll notice that I'm
> gave reference types special treatment. Specifically, I disabled this
> optimization on non-INTEGER_TYPE and floating point values, because the
> nvptx target was having some problems dereferencing boolean-typed
> pointers. That's something I have on my TODO list to track down later.

Please file an issue as appropriate.

> As for the performance gains, this optimization resulted in a
> non-trivial speedup in CloverLeaf running on a Nvidia Pascal board.
> CloverLeaf is somewhat special in that it consists of a lot of OpenACC
> offloaded regions which gets called multiple times throughout its
> execution. Consequently, it is I/O limited. The other benchmarks I ran
> didn't benefit nearly as much as CloverLeaf. I chose a small data set
> for CloverLeaf that only ran in 1.3s without the patch, and hence make
> it even more I/O limited. After the patch, it ran 0.35s faster.

\o/ Yay!

> This patch has been applied to gomp-4_0-branch.

(Not reviewed in detail.)

> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c

> +static tree
> +convert_from_firstprivate_pointer (tree var, bool is_ref, gimple_seq *gs)
> +{
> +  tree type = TREE_TYPE (var);
> +  tree new_type = NULL_TREE;
> +  tree tmp = NULL_TREE;
> +  tree inner_type = NULL_TREE;

    [...]/source-gcc/gcc/omp-low.c: In function 'tree_node* 
convert_from_firstprivate_pointer(tree, bool, gimple**)':
    [...]/source-gcc/gcc/omp-low.c:16515:8: warning: unused variable 
'inner_type' [-Wunused-variable]


> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90

I see:

    {+FAIL: libgomp.oacc-fortran/firstprivate-int.f90 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O  
(internal compiler error)+}
    {+FAIL: libgomp.oacc-fortran/firstprivate-int.f90 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O   4 
blank line(s) in output+}
    {+FAIL: libgomp.oacc-fortran/firstprivate-int.f90 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O  (test 
for excess errors)+}
    {+UNRESOLVED: libgomp.oacc-fortran/firstprivate-int.f90 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O  
compilation failed to produce executable+}

That's the nvptx offloading compiler configured with
"--enable-checking=yes,df,fold,rtl":

    
[...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90: 
In function 'MAIN__._omp_fn.1':
    
[...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:55:0:
 error: conversion of register to a different size
    VIEW_CONVERT_EXPR<logical(kind=2)>(_17);
    
    _18 = VIEW_CONVERT_EXPR<logical(kind=2)>(_17);
    
[...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:55:0:
 error: conversion of register to a different size
    VIEW_CONVERT_EXPR<logical(kind=4)>(_20);
    
    _21 = VIEW_CONVERT_EXPR<logical(kind=4)>(_20);
    
[...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:55:0:
 error: conversion of register to a different size
    VIEW_CONVERT_EXPR<logical(kind=8)>(_23);
    
    _24 = VIEW_CONVERT_EXPR<logical(kind=8)>(_23);
    
[...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:55:0:
 error: conversion of register to a different size
    VIEW_CONVERT_EXPR<logical(kind=16)>(_26);
    
    _27 = VIEW_CONVERT_EXPR<logical(kind=16)>(_26);
    
[...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:55:0:
 internal compiler error: verify_gimple failed
    0xa67d75 verify_gimple_in_cfg(function*, bool)
            [...]/source-gcc/gcc/tree-cfg.c:5125
    0x94ebbc execute_function_todo
            [...]/source-gcc/gcc/passes.c:1958
    0x94f513 execute_todo
            [...]/source-gcc/gcc/passes.c:2010


And with "-m32" multilib testing, I see:

    {+FAIL: libgomp.oacc-fortran/firstprivate-int.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O  (test for excess errors)+}
    {+UNRESOLVED: libgomp.oacc-fortran/firstprivate-int.f90 
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable  -O  compilation 
failed to produce executable+}

That is:

    
[...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:10:18:
 Error: Kind 16 not supported for type INTEGER at (1)
    
[...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:16:18:
 Error: Kind 16 not supported for type LOGICAL at (1)
    
[...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:115:18:
 Error: Kind 16 not supported for type INTEGER at (1)
    
[...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:121:18:
 Error: Kind 16 not supported for type LOGICAL at (1)
    
[...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:31:6:
 Error: Symbol 'i16i' at (1) has no IMPLICIT type
    
[...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:49:40:
 Error: Symbol 'i16o' at (1) has no IMPLICIT type
    
[...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:37:6:
 Error: Symbol 'l16i' at (1) has no IMPLICIT type
    
[...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:51:40:
 Error: Symbol 'l16o' at (1) has no IMPLICIT type
    
[...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:105:43:
 Error: Symbol 'i16i' at (1) has no IMPLICIT type
    
[...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:105:69:
 Error: Symbol 'i16o' at (1) has no IMPLICIT type
    
[...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:106:43:
 Error: Symbol 'l16i' at (1) has no IMPLICIT type
    
[...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:106:69:
 Error: Symbol 'l16o' at (1) has no IMPLICIT type


Grüße
 Thomas


> @@ -0,0 +1,203 @@
> +! Verify the GOMP_MAP_FIRSTPRIVATE_INT optimziation on various types.
> +
> +program test
> +  implicit none
> +
> +  integer (kind=1)  :: i1i, i1o
> +  integer (kind=2)  :: i2i, i2o
> +  integer (kind=4)  :: i4i, i4o
> +  integer (kind=8)  :: i8i, i8o
> +  integer (kind=16) :: i16i, i16o
> +
> +  logical (kind=1)  :: l1i, l1o
> +  logical (kind=2)  :: l2i, l2o
> +  logical (kind=4)  :: l4i, l4o
> +  logical (kind=8)  :: l8i, l8o
> +  logical (kind=16) :: l16i, l16o
> +
> +  real (kind=4)  :: r4i, r4o
> +  real (kind=8)  :: r8i, r8o
> +
> +  complex (kind=4)  :: c4i, c4o
> +  complex (kind=8)  :: c8i, c8o
> +
> +  character (kind=1) :: ch1i, ch1o
> +  character (kind=4) :: ch4i, ch4o
> +
> +  i1i = 1
> +  i2i = 2
> +  i4i = 3
> +  i8i = 4
> +  i16i = 5
> +
> +  l1i = .true.
> +  l2i = .false.
> +  l4i = .true.
> +  l8i = .true.
> +  l16i = .false.
> +
> +  r4i = .5
> +  r8i = .25
> +
> +  c4i = (2, -2)
> +  c8i = (4, -4)
> +
> +  ch1i = "a"
> +  ch4i = "b"
> +
> +  !$acc parallel firstprivate(i1i, i2i, i4i, i8i, i16i) &
> +  !$acc copyout(i1o, i2o, i4o, i8o, i16o) &
> +  !$acc firstprivate(l1i, l2i, l4i, l8i, l16i) &
> +  !$acc copyout(l1o, l2o, l4o, l8o, l16o) &
> +  !$acc firstprivate(r4i, r8i) copyout(r4o, r8o) &
> +  !$acc firstprivate(c4i, c8i) copyout(c4o, c8o) &
> +  !$acc firstprivate(ch1i, ch4i) &
> +  !$acc copyout(ch1o, ch4o)
> +  i1o = i1i
> +  i2o = i2i
> +  i4o = i4i
> +  i8o = i8i
> +  i16o = i16i
> +
> +  l1o = l1i
> +  l2o = l2i
> +  l4o = l4i
> +  l8o = l8i
> +  l16o = l16i
> +
> +  r4o = r4i
> +  r8o = r8i
> +
> +  c4o = c4i
> +  c8o = c8i
> +
> +  ch1o = ch1i
> +  ch4o = ch4i
> +  !$acc end parallel
> +
> +  if (i1i /= i1o) call abort
> +  if (i2i /= i2o) call abort
> +  if (i4i /= i4o) call abort
> +  if (i8i /= i8o) call abort
> +  if (i16i /= i16o) call abort
> +
> +  if (l1i .neqv. l1o) call abort
> +  if (l2i .neqv. l2o) call abort
> +  if (l4i .neqv. l4o) call abort
> +  if (l8i .neqv. l8o) call abort
> +  if (l16i .neqv. l16o) call abort
> +
> +  if (r4i /= r4o) call abort
> +  if (r8i /= r8o) call abort
> +
> +  if (c4i /= c4o) call abort
> +  if (c8i /= c8o) call abort
> +
> +  if (ch1i /= ch1o) call abort
> +  if (ch4i /= ch4o) call abort
> +
> +  call subtest(i1i, i2i, i4i, i8i, i16i, i1o, i2o, i4o, i8o, i16o, &
> +               l1i, l2i, l4i, l8i, l16i, l1o, l2o, l4o, l8o, l16o, &
> +               r4i, r8i, r4o, r8o, c4i, c8i, c4o, c8o, &
> +               ch1i, ch4i, ch1o, ch4o)
> +end program test
> +
> +subroutine subtest(i1i, i2i, i4i, i8i, i16i, i1o, i2o, i4o, i8o, i16o, &
> +                   l1i, l2i, l4i, l8i, l16i, l1o, l2o, l4o, l8o, l16o, &
> +                   r4i, r8i, r4o, r8o, c4i, c8i, c4o, c8o, &
> +                   ch1i, ch4i, ch1o, ch4o)
> +  implicit none
> +
> +  integer (kind=1)  :: i1i, i1o
> +  integer (kind=2)  :: i2i, i2o
> +  integer (kind=4)  :: i4i, i4o
> +  integer (kind=8)  :: i8i, i8o
> +  integer (kind=16) :: i16i, i16o
> +
> +  logical (kind=1)  :: l1i, l1o
> +  logical (kind=2)  :: l2i, l2o
> +  logical (kind=4)  :: l4i, l4o
> +  logical (kind=8)  :: l8i, l8o
> +  logical (kind=16) :: l16i, l16o
> +
> +  real (kind=4)  :: r4i, r4o
> +  real (kind=8)  :: r8i, r8o
> +
> +  complex (kind=4)  :: c4i, c4o
> +  complex (kind=8)  :: c8i, c8o
> +
> +  character (kind=1) :: ch1i, ch1o
> +  character (kind=4) :: ch4i, ch4o
> +
> +  i1i = -i1i
> +  i2i = -i2i
> +  i4i = -i4i
> +  i8i = -i8i
> +  i16i = -i16i
> +
> +  l1i = .not. l1i
> +  l2i = .not. l2i
> +  l4i = .not. l4i
> +  l8i = .not. l8i
> +  l16i = .not. l16i
> +
> +  r4i = -r4i
> +  r8i = -r8i
> +
> +  c4i = -c4i
> +  c8i = -c8i
> +
> +  ch1i = "z"
> +  ch4i = "y"
> +
> +  !$acc parallel firstprivate(i1i, i2i, i4i, i8i, i16i) &
> +  !$acc copyout(i1o, i2o, i4o, i8o, i16o) &
> +  !$acc firstprivate(l1i, l2i, l4i, l8i, l16i) &
> +  !$acc copyout(l1o, l2o, l4o, l8o, l16o) &
> +  !$acc firstprivate(r4i, r8i) copyout(r4o, r8o) &
> +  !$acc firstprivate(c4i, c8i) copyout(c4o, c8o) &
> +  !$acc firstprivate(ch1i, ch4i) &
> +  !$acc copyout(ch1o, ch4o)
> +  i1o = i1i
> +  i2o = i2i
> +  i4o = i4i
> +  i8o = i8i
> +  i16o = i16i
> +
> +  l1o = l1i
> +  l2o = l2i
> +  l4o = l4i
> +  l8o = l8i
> +  l16o = l16i
> +
> +  r4o = r4i
> +  r8o = r8i
> +
> +  c4o = c4i
> +  c8o = c8i
> +
> +  ch1o = ch1i
> +  ch4o = ch4i
> +  !$acc end parallel
> +
> +  if (i1i /= i1o) call abort
> +  if (i2i /= i2o) call abort
> +  if (i4i /= i4o) call abort
> +  if (i8i /= i8o) call abort
> +  if (i16i /= i16o) call abort
> +
> +  if (l1i .neqv. l1o) call abort
> +  if (l2i .neqv. l2o) call abort
> +  if (l4i .neqv. l4o) call abort
> +  if (l8i .neqv. l8o) call abort
> +  if (l16i .neqv. l16o) call abort
> +
> +  if (r4i /= r4o) call abort
> +  if (r8i /= r8o) call abort
> +
> +  if (c4i /= c4o) call abort
> +  if (c8i /= c8o) call abort
> +
> +  if (ch1i /= ch1o) call abort
> +  if (ch4i /= ch4o) call abort
> +end subroutine subtest

Reply via email to