[Bug c/59984] OpenMP and Cilk Plus SIMD pragma makes loop incorrect

jakub at gcc dot gnu.org Fri, 14 Nov 2014 07:04:38 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59984


Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |NEW
                 CC|                            |jamborm at gcc dot gnu.org,
                   |                            |rguenth at gcc dot gnu.org
           Assignee|jakub at gcc dot gnu.org           |unassigned at gcc dot 
gnu.org

--- Comment #13 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Stupachenko Evgeny from comment #12)
> Created attachment 33963 [details]
> test case where pragma simd disable vectorization
> 
> The following test case compiled with "-Ofast" vectorize the loop in the
> GetXsum function.
> Adding "-fopenmp" leads to failed vectorization due to:
> 
> simd_issue.cpp:26:18: note: not vectorized: data ref analysis failed
> D.2329[_7].x = _12;
> 
> It looks like before the patch in this Bug loop was vectorized with -fopenmp.

The testcase is invalid, you need reduction(+:sim) clause, otherwise the loop
has invalid inter-iteration dependencies.

That said, even with that, with C it vectorizes fine, while with C++ it
doesn't.

In *.einline the C -> C++ difference is (before that I don't see such):
-  D.1856[_19].x = _24;
-  _26 = &D.1856[_19];
-  _27 = MEM[(const struct XY *)_26].x;
+  D.2352[_19].x = _24;
+  _26 = &D.2352[_19];
+  _40 = MEM[(float *)_26];

In *.ealias the C -> C++ difference is:
-  D.1856[_19].x = _24;
-  _27 = MEM[(const struct XY *)&D.1856][_19].x;
+  D.2352[_19].x = _24;
+  _26 = &D.2352[_19];
+  _40 = MEM[(float *)_26];

and apparently FRE1 handles the former but not the latter.  Richard?
As the struct contains float at that offset, I don't see why FRE1 shouldn't
optimize that to _40 = _24.

Shorter testcase for the FRE1 missed-optimization:
struct S { float a, b; };

float
foo (int x, float y)
{
  struct S z[1024];
  z[x].a = y;
  struct S *p = &z[x];
  float *q = (float *) p;
  return *q;
}

(dunno why the inliner handles things differently between C and C++ on the #c12
testcase).  Now, as for vectorizing it even if FRE isn't able to optimize it,
we currently don't support interleaved accesses to the "omp simd array"
attributed arrays, perhaps we could at least some easy cases thereof, and
supposedly we should teach SRA about those too (like, if the arrays aren't
addressable and aren't accesses as whole, but just individual fields, split it
into separate "omp simd array" accesses instead.  In this particular case due
to the FRE missed optimization it is addressable though.
Or perhaps teach fold to gimple folding to fold that:
  q_5 = &z[x_2(D)];
  _6 = *q_5;
back into:
  _6 = z[x_2(D)].x;
?

[Bug c/59984] OpenMP and Cilk Plus SIMD pragma makes loop incorrect

Reply via email to