[Bug tree-optimization/87105] Autovectorization [X86, SSE2, AVX2, DoublePrecision]

rguenth at gcc dot gnu.org Tue, 23 Oct 2018 01:44:43 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87105


Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jamborm at gcc dot gnu.org,
                   |                            |marxin at gcc dot gnu.org

--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> ---
For the original testcase there's also the issue that the bez[] accesses end up
as

  _68 = MEM[(double *)bez_27(D) + 16B];
  _69 = MEM[(double *)bez_27(D) + 24B];

and thus they possibly alias with the earlier stores

  bBox_33(D)->x0 = iftmp.1_190;
  bBox_33(D)->y0 = iftmp.1_193;
  bBox_33(D)->x1 = iftmp.0_196;
  bBox_33(D)->y1 = iftmp.0_199;

which we subsequently fail to eliminate against the final ones

  iftmp.1_49 = MIN_EXPR <_119, iftmp.1_190>;
  bBox_33(D)->x0 = iftmp.1_49;
  iftmp.1_31 = MIN_EXPR <_118, iftmp.1_193>;
  bBox_33(D)->y0 = iftmp.1_31;
  iftmp.0_102 = MAX_EXPR <_119, iftmp.0_196>;
  bBox_33(D)->x1 = iftmp.0_102;
  iftmp.0_105 = MAX_EXPR <_118, iftmp.0_199>;
  bBox_33(D)->y1 = iftmp.0_105;

if we'd eliminate the earlier ones then vectorization would have had a chance
here.  That's from

// Linear interpolation, works with points as well.
template<typename V, typename T = double>
inline V lerp(const V& a, const V& b, const T& t) noexcept {
  return (a * (1.0 - t)) + (b * t);
}

and

// Min/Max - different semantics compared to std.
template<typename T> constexpr T myMin(const T& a, const T& b) noexcept
{ return b < a ? b : a; }
template<typename T> constexpr T myMax(const T& a, const T& b) noexcept
{ return a < b ? b : a; }

taking references to double.

Plus it is because IPA SRA decomposing the by reference Point passing
to

  _17 = MEM[(double *)b_2(D)];
  _18 = MEM[(double *)b_2(D) + 8B];
  _19 = MEM[(double *)&t];
  _20 = MEM[(double *)&t + 8B];
  D.19260 = _ZmlRK5PointS1_.isra.5 (_17, _18, _19, _20);

that's quite bad for TBAA ... the function uses b->x, etc. to access the
memory.

[Bug tree-optimization/87105] Autovectorization [X86, SSE2, AVX2, DoublePrecision]

Reply via email to