Hi!

On 2022-04-12T15:45:03+0200, Richard Biener <rguent...@suse.de> wrote:
> On Tue, 12 Apr 2022, Thomas Schwinge wrote:
>> On 2022-04-07T15:04:15+0200, Richard Biener via Gcc-patches 
>> <gcc-patches@gcc.gnu.org> wrote:
>> > On Thu, 7 Apr 2022, Jan Hubicka wrote:
>> >> > On Thu, 7 Apr 2022, Jan Hubicka wrote:
>> >> > > this patch fixes miscompilation of gnatmake.  Modref attempts to 
>> >> > > track memory
>> >> > > accesses relative to the base pointers which are parameters of 
>> >> > > functions.
>> >> > > If it fails, it still makes difference between unknown memory access 
>> >> > > and
>> >> > > global memory access.  The second makes it possible to disambiguate 
>> >> > > with
>> >> > > memory that is not accessible from outside world (i.e. everything 
>> >> > > that does
>> >> > > not escape from the caller function).  This is useful so we do not 
>> >> > > punt
>> >> > > when unknown function is called.
>> >> > >
>> >> > > Now I added ref_may_access_global_memory_p to tree-ssa-alias whic is 
>> >> > > using
>> >> > > ptr_deref_may_alias_global_p.  There is however a shift in meaning of 
>> >> > > this
>> >> > > predicate: the second tests that the dereference may alias with 
>> >> > > global variable.
>> >> > >
>> >> > > In the testcase we are disambiguating heap allocated escaping memory 
>> >> > > which is
>> >> > > not a global variable but it is still a global memory in the modref's 
>> >> > > sense.
>> >> > > So we need to test in addition contains_escaped.
>> >> > >
>> >> > > The patch simply copies logic from the predicate and adds the check.
>> >> > > I am not sure if there is better way to handle this?
>> >> >
>> >> > I'm testing the following variant which exposes this detail
>> >> > (escaped local memory global or not) in the APIs that say "global"
>> >> > which allows to remove ref_may_access_global_memory_p.
>> >>
>> >> Thank you.  Indeed it is better to have an explicit flag, since the
>> >> clash of names is bit sensitive.
>> >
>> > OK - bootstrapped / tested on x86_64-unknown-linux-gnu including Ada
>> > and now pushed.
>>
>> This commit r12-8048-g8c0ebaf9f586100920a3c0849fb10e9985d7ae58
>> "ipa/104303 - miscompilation of gnatmake" is causing one regression in
>> nvptx offload testing:
>>
>>     [...]
>>     [-PASS:-]{+FAIL:+} libgomp.oacc-fortran/private-variables.f90 
>> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O1   at 
>> line 142 (test for bogus messages, line 131)
>>     [...]
>>
>> I've done a before/after 'diff' of
>> '-fdump-tree-all -foffload-options=nvptx-none=-fdump-tree-all'
>> with all functions and calls other than 't4' commented out.
>
> I suppose the
>
> diff --git a/gcc/tree-ssa-dce.cc b/gcc/tree-ssa-dce.cc
> index 2a13ea34829..34ce8abe33a 100644
> --- a/gcc/tree-ssa-dce.cc
> +++ b/gcc/tree-ssa-dce.cc
> @@ -315,7 +315,7 @@ mark_stmt_if_obviously_necessary (gimple *stmt, bool
> aggressive)
>      }
>
>    if ((gimple_vdef (stmt) && keep_all_vdefs_p ())
> -      || stmt_may_clobber_global_p (stmt))
> +      || stmt_may_clobber_global_p (stmt, true))
>      {
>        mark_stmt_necessary (stmt, true);
>        return;
>
> was overly conservative (I was probably misled by keep_all_vdefs_p ()),
> passing false to stmt_may_clobber_global_p should fix the regression?

It does, thanks.  Per more 'diff'ing, this change again enables
the '-O1' host compilation 'a-private-variables.f90.117t.dce2' to
clean this up, and likewise for nvptx offload compilation
'a.xnvptx-none.mkoffload.117t.dce2', thus the extra diagnostic
again disappears.  In fact, for all optimization flags variants
that I've tried, we've again got the exactly same dumps as before
commit r12-8048-g8c0ebaf9f586100920a3c0849fb10e9985d7ae58
"ipa/104303 - miscompilation of gnatmake"!

So I'll run that through standard bootstrap/regression testing, and push?
Got a suggestion for rationale to put into the commit log?


Grüße
 Thomas


>> For '-O0', there's no difference at all.
>>
>> For '-O1', for host compilation we see:
>>
>>     diff -ru 0-O1/a-private-variables.f90.117t.dce2 
>> ./a-private-variables.f90.117t.dce2
>>     --- 0-O1/a-private-variables.f90.117t.dce2      2022-04-12 
>> 08:36:54.525302868 +0200
>>     +++ ./a-private-variables.f90.117t.dce2 2022-04-12 12:51:43.726304109 
>> +0200
>>     @@ -30,9 +30,13 @@
>>
>>        <bb 3> [local count: 87490071]:
>>        # .offset.15_2 = PHI <0(2), .offset.15_63(5)>
>>     +  pt.x = .offset.15_2;
>>        _25 = .offset.15_2 * 2;
>>     +  pt.y = _25;
>>        _27 = .offset.15_2 * 4;
>>     +  pt.z = _27;
>>        _29 = .offset.15_2 * 6;
>>     +  pt.attr[4] = _29;
>>
>>        <bb 4> [local count: 1073741824]:
>>        # .offset.10_4 = PHI <0(3), .offset.10_56(4)>
>>     diff -ru 0-O1/a-private-variables.f90.118t.stdarg 
>> ./a-private-variables.f90.118t.stdarg
>>     --- 0-O1/a-private-variables.f90.118t.stdarg    2022-04-12 
>> 08:36:54.525302868 +0200
>>     +++ ./a-private-variables.f90.118t.stdarg       2022-04-12 
>> 12:51:43.726304109 +0200
>>     @@ -4,6 +4,7 @@
>>      __attribute__((oacc function (1, 1, 1), oacc parallel, omp target 
>> entrypoint))
>>      void t4_._omp_fn.0 (const struct .omp_data_t.1 & restrict .omp_data_i)
>>      {
>>     +  struct vec3 pt;
>>        integer(kind=4) .offset.15_2;
>>        integer(kind=4) .offset.10_4;
>>        integer(kind=4) _25;
>>     @@ -25,9 +26,13 @@
>>
>>        <bb 3> [local count: 87490071]:
>>        # .offset.15_2 = PHI <0(2), .offset.15_63(5)>
>>     +  pt.x = .offset.15_2;
>>        _25 = .offset.15_2 * 2;
>>     +  pt.y = _25;
>>        _27 = .offset.15_2 * 4;
>>     +  pt.z = _27;
>>        _29 = .offset.15_2 * 6;
>>     +  pt.attr[4] = _29;
>>
>>        <bb 4> [local count: 1073741824]:
>>        # .offset.10_4 = PHI <0(3), .offset.10_56(4)>
>>     [Similar for following passes/dumps.]
>>     diff -ru 0-O1/a-private-variables.f90.141t.lim2 
>> ./a-private-variables.f90.141t.lim2
>>     --- 0-O1/a-private-variables.f90.141t.lim2      2022-04-12 
>> 08:36:54.525302868 +0200
>>     +++ ./a-private-variables.f90.141t.lim2 2022-04-12 12:51:43.730304125 
>> +0200
>>     @@ -24,11 +24,42 @@
>>      ;; 5 succs { 7 6 }
>>      ;; 7 succs { 3 }
>>      ;; 6 succs { 1 }
>>     +
>>     +Symbols to be put in SSA form
>>     +{ D.4340 D.4356 D.4357 D.4358 D.4359 D.4360 D.4361 D.4362 D.4363 }
>>     +Incremental SSA update started at block: 0
>>     +Number of blocks in CFG: 9
>>     +Number of blocks to update: 8 ( 89%)
>>     +
>>     +
>>     +
>>     +SSA replacement table
>>     +N_i -> { O_1 ... O_j } means that N_i replaces O_1, ..., O_j
>>     +
>>     +pt_x_lsm.22_1 -> { pt_x_lsm.22_72 }
>>     +pt_z_lsm.24_20 -> { pt_z_lsm.24_9 }
>>     +pt_attr_I_lsm.25_21 -> { pt_attr_I_lsm.25_10 }
>>     +pt_y_lsm.23_22 -> { pt_y_lsm.23_73 }
>>     +Incremental SSA update started at block: 3
>>     +Number of blocks in CFG: 9
>>     +Number of blocks to update: 3 ( 33%)
>>     +
>>     +
>>      __attribute__((oacc function (1, 1, 1), oacc parallel, omp target 
>> entrypoint))
>>      void t4_._omp_fn.0 (const struct .omp_data_t.1 & restrict .omp_data_i)
>>      {
>>     +  integer(kind=4) D.4363;
>>     +  integer(kind=4) pt_attr_I_lsm.25;
>>     +  integer(kind=4) D.4361;
>>     +  integer(kind=4) pt_z_lsm.24;
>>     +  integer(kind=4) D.4359;
>>     +  integer(kind=4) pt_y_lsm.23;
>>     +  integer(kind=4) D.4357;
>>     +  integer(kind=4) pt_x_lsm.22;
>>     +  struct vec3 pt;
>>        integer(kind=4) .offset.15_2;
>>        integer(kind=4) .offset.10_4;
>>     +  integer(kind=4) _7(D);
>>        integer(kind=4) _25;
>>        integer(kind=4) _27;
>>        integer(kind=4) _29;
>>     @@ -37,21 +68,32 @@
>>        integer(kind=8) _43;
>>        integer(kind=4)[1025] * _45;
>>        integer(kind=4) _46;
>>     +  integer(kind=4) _47(D);
>>        integer(kind=4) _48;
>>        integer(kind=4) _50;
>>     +  integer(kind=4) _51(D);
>>        integer(kind=4) _52;
>>        integer(kind=4) _54;
>>        integer(kind=4) .offset.10_56;
>>        integer(kind=4) .offset.15_63;
>>     +  integer(kind=4) _70(D);
>>
>>        <bb 2> [local count: 7128820]:
>>     +  pt_x_lsm.22_8 = _7(D);
>>     +  pt_y_lsm.23_49 = _47(D);
>>     +  pt_z_lsm.24_53 = _51(D);
>>     +  pt_attr_I_lsm.25_71 = _70(D);
>>        _45 = *.omp_data_i_44(D).arr;
>>
>>        <bb 3> [local count: 87490071]:
>>        # .offset.15_2 = PHI <0(2), .offset.15_63(7)>
>>     +  pt_x_lsm.22_72 = .offset.15_2;
>>        _25 = .offset.15_2 * 2;
>>     +  pt_y_lsm.23_73 = _25;
>>        _27 = .offset.15_2 * 4;
>>     +  pt_z_lsm.24_9 = _27;
>>        _29 = .offset.15_2 * 6;
>>     +  pt_attr_I_lsm.25_10 = _29;
>>        _41 = .offset.15_2 * 32;
>>
>>        <bb 4> [local count: 1073741824]:
>>     @@ -84,6 +126,14 @@
>>        goto <bb 3>; [100.00%]
>>
>>        <bb 6> [local count: 35644102]:
>>     +  # pt_z_lsm.24_20 = PHI <pt_z_lsm.24_9(5)>
>>     +  # pt_attr_I_lsm.25_21 = PHI <pt_attr_I_lsm.25_10(5)>
>>     +  # pt_x_lsm.22_1 = PHI <pt_x_lsm.22_72(5)>
>>     +  # pt_y_lsm.23_22 = PHI <pt_y_lsm.23_73(5)>
>>     +  pt.attr[4] = pt_attr_I_lsm.25_21;
>>     +  pt.z = pt_z_lsm.24_20;
>>     +  pt.y = pt_y_lsm.23_22;
>>     +  pt.x = pt_x_lsm.22_1;
>>        return;
>>
>>      }
>>     [Similar for following passes/dumps.]
>>     diff -ru 0-O1/a-private-variables.f90.148t.dse3 
>> ./a-private-variables.f90.148t.dse3
>>     --- 0-O1/a-private-variables.f90.148t.dse3      2022-04-12 
>> 08:36:54.525302868 +0200
>>     +++ ./a-private-variables.f90.148t.dse3 2022-04-12 12:51:43.730304125 
>> +0200
>>     @@ -1,11 +1,24 @@
>>
>>      ;; Function t4_._omp_fn.0 (t4_._omp_fn.0, funcdef_no=3, decl_uid=4278, 
>> cgraph_uid=4, symbol_order=3)
>>
>>     +Removing basic block 7
>>     +Removing basic block 8
>>     +Removing basic block 9
>>      __attribute__((oacc function (1, 1, 1), oacc parallel, omp target 
>> entrypoint))
>>      void t4_._omp_fn.0 (const struct .omp_data_t.1 & restrict .omp_data_i)
>>      {
>>     +  integer(kind=4) D.4363;
>>     +  integer(kind=4) pt_attr_I_lsm.25;
>>     +  integer(kind=4) D.4361;
>>     +  integer(kind=4) pt_z_lsm.24;
>>     +  integer(kind=4) D.4359;
>>     +  integer(kind=4) pt_y_lsm.23;
>>     +  integer(kind=4) D.4357;
>>     +  integer(kind=4) pt_x_lsm.22;
>>     +  struct vec3 pt;
>>        integer(kind=4) .offset.15_2;
>>        integer(kind=4) .offset.10_4;
>>     +  integer(kind=4) _7(D);
>>        integer(kind=4) _25;
>>        integer(kind=4) _27;
>>        integer(kind=4) _29;
>>     @@ -14,25 +27,28 @@
>>        integer(kind=8) _43;
>>        integer(kind=4)[1025] * _45;
>>        integer(kind=4) _46;
>>     +  integer(kind=4) _47(D);
>>        integer(kind=4) _48;
>>        integer(kind=4) _50;
>>     +  integer(kind=4) _51(D);
>>        integer(kind=4) _52;
>>        integer(kind=4) _54;
>>        integer(kind=4) .offset.10_56;
>>        integer(kind=4) .offset.15_63;
>>     +  integer(kind=4) _70(D);
>>
>>        <bb 2> [local count: 7128820]:
>>        _45 = *.omp_data_i_44(D).arr;
>>
>>        <bb 3> [local count: 87490071]:
>>     -  # .offset.15_2 = PHI <0(2), .offset.15_63(7)>
>>     +  # .offset.15_2 = PHI <0(2), .offset.15_63(5)>
>>        _25 = .offset.15_2 * 2;
>>        _27 = .offset.15_2 * 4;
>>        _29 = .offset.15_2 * 6;
>>        _41 = .offset.15_2 * 32;
>>
>>        <bb 4> [local count: 1073741824]:
>>     -  # .offset.10_4 = PHI <0(3), .offset.10_56(8)>
>>     +  # .offset.10_4 = PHI <0(3), .offset.10_56(4)>
>>        _42 = .offset.10_4 + _41;
>>        _43 = (integer(kind=8)) _42;
>>        _46 = (*_45)[_43];
>>     @@ -43,23 +59,17 @@
>>        (*_45)[_43] = _54;
>>        .offset.10_56 = .offset.10_4 + 1;
>>        if (.offset.10_56 <= 31)
>>     -    goto <bb 8>; [89.00%]
>>     +    goto <bb 4>; [89.00%]
>>        else
>>          goto <bb 5>; [11.00%]
>>
>>     -  <bb 8> [local count: 955630224]:
>>     -  goto <bb 4>; [100.00%]
>>     -
>>        <bb 5> [local count: 437450365]:
>>        .offset.15_63 = .offset.15_2 + 1;
>>        if (.offset.15_63 <= 31)
>>     -    goto <bb 7>; [89.00%]
>>     +    goto <bb 3>; [89.00%]
>>        else
>>          goto <bb 6>; [11.00%]
>>
>>     -  <bb 7> [local count: 389330825]:
>>     -  goto <bb 3>; [100.00%]
>>     -
>>        <bb 6> [local count: 35644102]:
>>        return;
>>
>>     [Similar for following passes/dumps.]
>>
>> ..., so in 'a-private-variables.f90.148t.dse3', the 'pt.{x,y,z,attr}'
>> assignments for the new 'struct vec3 pt;' get cleaned out, so that should
>> all be fine; no actual changes in the end.
>>
>> Comparing '-O1' nvptx offload target compilation before/after, the first
>> difference is in 'a.xnvptx-none.mkoffload.117t.dce2': similar to host
>> compilation.  But then, in the following things do not get cleaned up as
>> they do for the host compilation; the 'pt.{x,y,z,attr}' assignments for
>> the new 'struct vec3 pt;' persist:
>>
>>     diff -ru 0-O1/a.xnvptx-none.mkoffload.252t.optimized 
>> ./a.xnvptx-none.mkoffload.252t.optimized
>>     --- 0-O1/a.xnvptx-none.mkoffload.252t.optimized 2022-04-12 
>> 08:36:54.569303204 +0200
>>     +++ ./a.xnvptx-none.mkoffload.252t.optimized    2022-04-12 
>> 12:51:43.774304292 +0200
>>     @@ -7,34 +7,36 @@
>>      __attribute__((oacc function (32, 8, 32), oacc parallel, omp target 
>> entrypoint))
>>      void t4_._omp_fn.0 (const struct .omp_data_t.1 & restrict .omp_data_i)
>>      {
>>     -  unsigned int ivtmp$6;
>>     +  unsigned int ivtmp$8;
>>        unsigned int ivtmp$5;
>>     +  unsigned int ivtmp$3;
>>        int D.1527;
>>        int D.1524;
>>     +  struct vec3 pt;
>>        int _2;
>>     +  int[1025] * _4;
>>     +  int _22;
>>        int _23;
>>        int _25;
>>        int _27;
>>        int _29;
>>        int _34;
>>     -  int[1025] * _43;
>>     -  sizetype _45;
>>     -  sizetype _46;
>>     -  int[1025] * _47;
>>     -  unsigned int _48;
>>     +  sizetype _41;
>>     +  unsigned int _42;
>>     +  unsigned int _43;
>>     +  unsigned int _45;
>>     +  int _46;
>>        int _49;
>>     -  unsigned int _50;
>>        int _51;
>>     -  int _52;
>>     -  int _53;
>>     -  int _63;
>>     -  int _82;
>>     -  int _83;
>>     -  int _87;
>>     +  int _54;
>>     +  sizetype _63;
>>        int _95;
>>     +  int _96;
>>     +  int _99;
>>        int _102;
>>     -  int _103;
>>     +  int[1025] * _103;
>>        int _104;
>>     +  int _105;
>>        int _107;
>>
>>        <bb 2> [local count: 7128820]:
>>     @@ -47,43 +49,50 @@
>>          goto <bb 7>; [73.00%]
>>
>>        <bb 3> [local count: 1924781]:
>>     -  _87 = _104 * 32;
>>     -  ivtmp$5_80 = (unsigned int) _87;
>>     -  _52 = _104 * 2;
>>     -  ivtmp$6_54 = (unsigned int) _52;
>>     +  ivtmp$3_61 = (unsigned int) _104;
>>     +  _54 = _104 * 2;
>>     +  ivtmp$5_56 = (unsigned int) _54;
>>     +  _46 = _104 * 32;
>>     +  ivtmp$8_48 = (unsigned int) _46;
>>     +  _42 = (unsigned int) _23;
>>
>>        <bb 4> [local count: 87490071]:
>>     -  # _2 = PHI <_104(3), _63(6)>
>>     -  # ivtmp$5_22 = PHI <ivtmp$5_80(3), ivtmp$5_81(6)>
>>     -  # ivtmp$6_72 = PHI <ivtmp$6_54(3), ivtmp$6_56(6)>
>>     -  _25 = (int) ivtmp$6_72;
>>     -  _50 = ivtmp$6_72 * 2;
>>     -  _27 = (int) _50;
>>     -  _48 = ivtmp$6_72 * 3;
>>     -  _29 = (int) _48;
>>     +  # ivtmp$3_106 = PHI <ivtmp$3_61(3), ivtmp$3_47(6)>
>>     +  # ivtmp$5_87 = PHI <ivtmp$5_56(3), ivtmp$5_72(6)>
>>     +  # ivtmp$8_52 = PHI <ivtmp$8_48(3), ivtmp$8_50(6)>
>>     +  _2 = (int) ivtmp$3_106;
>>     +  pt.x = _2;
>>     +  _25 = (int) ivtmp$5_87;
>>     +  pt.y = _25;
>>     +  _45 = ivtmp$5_87 * 2;
>>     +  _27 = (int) _45;
>>     +  pt.z = _27;
>>     +  _43 = ivtmp$5_87 * 3;
>>     +  _29 = (int) _43;
>>     +  pt.attr[4] = _29;
>>        _34 = .UNIQUE (OACC_FORK, 0, 2);
>>
>>        <bb 5> [local count: 437450365]:
>>        _107 = .GOACC_DIM_POS (2);
>>     -  _82 = (int) ivtmp$5_22;
>>     -  _83 = _82 + _107;
>>     -  _47 = *.omp_data_i_44(D).arr;
>>     -  _46 = (sizetype) _83;
>>     -  _45 = _46 * 4;
>>     -  _43 = _47 + _45;
>>     -  _49 = MEM <int> [(int[1025] *)_43];
>>     -  _51 = _2 + _49;
>>     -  _53 = _25 + _51;
>>     -  _103 = _27 + _53;
>>     -  _95 = _29 + _103;
>>     -  MEM <int> [(int[1025] *)_43] = _95;
>>     +  _49 = (int) ivtmp$8_52;
>>     +  _51 = _49 + _107;
>>     +  _103 = *.omp_data_i_44(D).arr;
>>     +  _63 = (sizetype) _51;
>>     +  _41 = _63 * 4;
>>     +  _4 = _103 + _41;
>>     +  _95 = MEM <int> [(int[1025] *)_4];
>>     +  _96 = _2 + _95;
>>     +  _22 = _25 + _96;
>>     +  _105 = _22 + _27;
>>     +  _99 = _29 + _105;
>>     +  MEM <int> [(int[1025] *)_4] = _99;
>>        .UNIQUE (OACC_JOIN, _34, 2);
>>
>>        <bb 6> [local count: 87490071]:
>>     -  _63 = _2 + 1;
>>     -  ivtmp$5_81 = ivtmp$5_22 + 32;
>>     -  ivtmp$6_56 = ivtmp$6_72 + 2;
>>     -  if (_23 != _63)
>>     +  ivtmp$3_47 = ivtmp$3_106 + 1;
>>     +  ivtmp$5_72 = ivtmp$5_87 + 2;
>>     +  ivtmp$8_50 = ivtmp$8_52 + 32;
>>     +  if (_42 != ivtmp$3_47)
>>          goto <bb 4>; [89.00%]
>>        else
>>          goto <bb 7>; [11.00%]
>>
>> ..., and thus the change in diagnostics:
>>
>>      [...]
>>      
>> source-gcc/libgomp/testsuite/libgomp.oacc-fortran/private-variables.f90:131:63:
>>  note: variable ‘pt’ ought to be adjusted for OpenACC privatization level: 
>> ‘gang’
>>      
>> source-gcc/libgomp/testsuite/libgomp.oacc-fortran/private-variables.f90:131:63:
>>  note: variable ‘pt’ ought to be adjusted for OpenACC privatization level: 
>> ‘gang’
>>     
>> +source-gcc/libgomp/testsuite/libgomp.oacc-fortran/private-variables.f90: In 
>> function ‘t4_._omp_fn.0’:
>>     
>> +source-gcc/libgomp/testsuite/libgomp.oacc-fortran/private-variables.f90:131:63:
>>  note: variable ‘pt’ adjusted for OpenACC privatization level: ‘gang’
>>     +  131 |   !$acc loop gang private(pt) ! { dg-line l_loop[incr c_loop] }
>>     +      |                                                               ^
>>
>> For '-O2', host compilation begins same as '-O1', and again in
>> 'a-private-variables.f90.148t.dse3', the 'pt.{x,y,z,attr}' assignments
>> for the new 'struct vec3 pt;' get cleaned out:
>>
>>     --- ./a-private-variables.f90.144t.sink1        2022-04-12 
>> 14:28:19.173520425 +0200
>>     +++ ./a-private-variables.f90.148t.dse3 2022-04-12 14:28:19.173520425 
>> +0200
>>     [...]
>>      __attribute__((oacc function (1, 1, 1), oacc parallel, omp target 
>> entrypoint))
>>      void t4_._omp_fn.0 (const struct .omp_data_t.1 & restrict .omp_data_i)
>>      {
>>     @@ -97,10 +74,6 @@
>>        goto <bb 3>; [100.00%]
>>
>>        <bb 6> [local count: 35644102]:
>>     -  pt.attr[4] = _29;
>>     -  pt.z = _27;
>>     -  pt.y = _25;
>>     -  pt.x = .offset.15_2;
>>        return;
>>
>>      }
>>     [...]
>>
>> For '-O2', nvptx offload target compilation looks very similar to host
>> compilation, and again in 'a.xnvptx-none.mkoffload.148t.dse3', the
>> 'pt.{x,y,z,attr}' assignments for the new 'struct vec3 pt;' get cleaned
>> out:
>>
>>     --- ./a.xnvptx-none.mkoffload.144t.sink1        2022-04-12 
>> 14:28:19.213520366 +0200
>>     +++ ./a.xnvptx-none.mkoffload.148t.dse3 2022-04-12 14:28:19.213520366 
>> +0200
>>     [...]
>>      __attribute__((oacc function (32, 8, 32), oacc parallel, omp target 
>> entrypoint))
>>      void t4_._omp_fn.0 (const struct .omp_data_t.1 & restrict .omp_data_i)
>>      {
>>     @@ -34,13 +25,9 @@
>>
>>        <bb 2> [local count: 7128820]:
>>        _104 = .GOACC_DIM_POS (0);
>>     -  pt.x = _104;
>>        _25 = _104 * 2;
>>     -  pt.y = _25;
>>        _27 = _104 * 4;
>>     -  pt.z = _27;
>>        _29 = _104 * 6;
>>     -  pt.attr[4] = _29;
>>        _34 = .UNIQUE (OACC_FORK, 0, 2);
>>
>>        <bb 3> [local count: 437450365]:
>>
>> ..., so no actual changes in the end.
>>
>> I have not verified other ("higher") optimization levels, but given no
>> change in diagnostics, I suppose the same ("no actual changes") happens
>> for those.
>>
>> Is the '-O1' change/regression unexpected, and should be analyzed, or
>> should we just accept the slightly worse code generation (for '-O1'
>> only), and I accordingly adjust the test case for the change in
>> diagnostics?
>>
>>
>> Grüße
>>  Thomas
>> -----------------
>> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 
>> 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: 
>> Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; 
>> Registergericht München, HRB 106955
>>
>
> --
> Richard Biener <rguent...@suse.de>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955

Reply via email to