https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80844
--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Jakub Jelinek from comment #2) > (In reply to Richard Biener from comment #1) > > If OMP SIMD always zeros the vector then it could also emit the maybe easier > > to optimize > > > > WITH_SIZE_EXPR<_3, D.2841> = {}; > > It doesn't always zero, it can be pretty arbitrary. Ah, the memset gets exposed by loop distribution. Before we have <bb 3> [5.67%]: # _28 = PHI <_27(13), 0(10)> D.2357[_28] = 0.0; _27 = _28 + 1; if (_15 > _27) goto <bb 13>; [85.00%] else goto <bb 4>; [15.00%] <bb 13> [4.82%]: goto <bb 3>; [100.00%] so indeed the other cases will be more "interesting". For your latest idea to work we have to make sure the prologue / epilogue loop doesn't get unrolled / pattern matched. I'll still look at enhancing memset folding (it's pretty conservative in the cases it handles). > For the default > reductions on integral/floating point types it does zero for +/-/|/^/|| > reductions, but e.g. 1 for */&&, or ~0 for &, or maximum or minimum for min > or max. For user defined reductions it can be whatever the user requests, > constructor for some class type, function call, set to arbitrary value etc. > For other privatization clauses it is again something different > (uninitialized for private/lastprivate, some other var + some bias for > linear, ...). > And then after the simd loop there is again a reduction or something > similar, but again can be quite complex in the general case. If it helps, > we could mark the pre-simd and post-simd loops somehow in the loop structure > or something, but the actual work needs to be done later, especially after > inlining, including the vectorizer and other passes. > E.g. for the typical reduction where the vectorizer computes the "simd > array" in a vector temporary (or collection of them), it would be nice if we > were able to pattern recognize simple cases and turn those into vector > reduction patterns.