https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59984
Jakub Jelinek <jakub at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEW CC| |jamborm at gcc dot gnu.org, | |rguenth at gcc dot gnu.org Assignee|jakub at gcc dot gnu.org |unassigned at gcc dot gnu.org --- Comment #13 from Jakub Jelinek <jakub at gcc dot gnu.org> --- (In reply to Stupachenko Evgeny from comment #12) > Created attachment 33963 [details] > test case where pragma simd disable vectorization > > The following test case compiled with "-Ofast" vectorize the loop in the > GetXsum function. > Adding "-fopenmp" leads to failed vectorization due to: > > simd_issue.cpp:26:18: note: not vectorized: data ref analysis failed > D.2329[_7].x = _12; > > It looks like before the patch in this Bug loop was vectorized with -fopenmp. The testcase is invalid, you need reduction(+:sim) clause, otherwise the loop has invalid inter-iteration dependencies. That said, even with that, with C it vectorizes fine, while with C++ it doesn't. In *.einline the C -> C++ difference is (before that I don't see such): - D.1856[_19].x = _24; - _26 = &D.1856[_19]; - _27 = MEM[(const struct XY *)_26].x; + D.2352[_19].x = _24; + _26 = &D.2352[_19]; + _40 = MEM[(float *)_26]; In *.ealias the C -> C++ difference is: - D.1856[_19].x = _24; - _27 = MEM[(const struct XY *)&D.1856][_19].x; + D.2352[_19].x = _24; + _26 = &D.2352[_19]; + _40 = MEM[(float *)_26]; and apparently FRE1 handles the former but not the latter. Richard? As the struct contains float at that offset, I don't see why FRE1 shouldn't optimize that to _40 = _24. Shorter testcase for the FRE1 missed-optimization: struct S { float a, b; }; float foo (int x, float y) { struct S z[1024]; z[x].a = y; struct S *p = &z[x]; float *q = (float *) p; return *q; } (dunno why the inliner handles things differently between C and C++ on the #c12 testcase). Now, as for vectorizing it even if FRE isn't able to optimize it, we currently don't support interleaved accesses to the "omp simd array" attributed arrays, perhaps we could at least some easy cases thereof, and supposedly we should teach SRA about those too (like, if the arrays aren't addressable and aren't accesses as whole, but just individual fields, split it into separate "omp simd array" accesses instead. In this particular case due to the FRE missed optimization it is addressable though. Or perhaps teach fold to gimple folding to fold that: q_5 = &z[x_2(D)]; _6 = *q_5; back into: _6 = z[x_2(D)].x; ?