[Bug tree-optimization/80844] OpenMP SIMD doesn't know how to efficiently zero a vector (its stores zeros and reloads)

rguenth at gcc dot gnu.org Wed, 24 May 2017 06:01:26 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80844


--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Jakub Jelinek from comment #2)
> (In reply to Richard Biener from comment #1)
> > If OMP SIMD always zeros the vector then it could also emit the maybe easier
> > to optimize
> > 
> >   WITH_SIZE_EXPR<_3, D.2841> = {};
> 
> It doesn't always zero, it can be pretty arbitrary.

Ah, the memset gets exposed by loop distribution.  Before we have

  <bb 3> [5.67%]:
  # _28 = PHI <_27(13), 0(10)>
  D.2357[_28] = 0.0;
  _27 = _28 + 1;
  if (_15 > _27)
    goto <bb 13>; [85.00%]
  else
    goto <bb 4>; [15.00%]

  <bb 13> [4.82%]:
  goto <bb 3>; [100.00%]

so indeed the other cases will be more "interesting".

For your latest idea to work we have to make sure the prologue / epilogue
loop doesn't get unrolled / pattern matched.

I'll still look at enhancing memset folding (it's pretty conservative
in the cases it handles).

>  For the default
> reductions on integral/floating point types it does zero for +/-/|/^/||
> reductions, but e.g. 1 for */&&, or ~0 for &, or maximum or minimum for min
> or max.  For user defined reductions it can be whatever the user requests,
> constructor for some class type, function call, set to arbitrary value etc.
> For other privatization clauses it is again something different
> (uninitialized for private/lastprivate, some other var + some bias for
> linear, ...).
> And then after the simd loop there is again a reduction or something
> similar, but again can be quite complex in the general case.  If it helps,
> we could mark the pre-simd and post-simd loops somehow in the loop structure
> or something, but the actual work needs to be done later, especially after
> inlining, including the vectorizer and other passes.
> E.g. for the typical reduction where the vectorizer computes the "simd
> array" in a vector temporary (or collection of them), it would be nice if we
> were able to pattern recognize simple cases and turn those into vector
> reduction patterns.

[Bug tree-optimization/80844] OpenMP SIMD doesn't know how to efficiently zero a vector (its stores zeros and reloads)

Reply via email to