Thanks a lot for tracking down / opening the relevant PRs. about:
> > (6) loop distribution is required to break a dependence. This may > > already be handled by Sebastian's loop-distribution pass that will > > be incorporated in 4.4. > > Here is an example: > > for (i__ = 2; i__ <= i__2; ++i__) > > { > > a[i__] += c__[i__] * d__[i__]; > > b[i__] = a[i__] + d__[i__] + b[i__ - 1]; > > } > > This happens in the loop in line 2136. > > Need to check if we need to open a missed optimization PR for this.> > > I don't think that this is a loop distribution issue. The dependence > between the store to a[i] and the load from a[i] doesn't prevent > vectorization. right, > The problematic one is between the store to b[i] and > the load from b[i-1] in the second statement. ...which is exactly why loop distribution could make this loop (partially) vectorizable - separating the first and second statements into separate loops would allow vectorizing the first of the two resulting loops (which is probably what icc does - icc reports that this loop is partially vectrizable). dorit Ira Rosen/Haifa/IBM wrote on 17/02/2008 11:42:55: > Hi, > > Dorit Nuzman/Haifa/IBM wrote on 14/02/2008 17:02:45: > > > This is an old debt: A while back Tim had sent me a detailed report > > off line showing which C++ tests (originally from the Dongara loops > > suite) were vectorized by current g++ or icpc, or both, as well as > > when the vectorization by icpc required a pragma, or was partial. I > > went over the loops that were reported to be vectorized by icc but > > not by gcc, to see which features we are missing. There are 23 such > > loops (out of a total of 77). They fall into the following 7 categories: > > > > (1) scalar evolution analysis fails with "evolution of base is notaffine". > > This happens in the 3 loops in lines 4267, 4204 and 511. > > Here an example: > > for (i__ = 1; i__ <= i__2; ++i__) > > { > > a[i__] = (b[i__] + b[im1] + b[im2]) * .333f; > > im2 = im1; > > im1 = i__; > > } > > Missed optimization PR to be opened. > > I opened PR35224. > > > > > (2) Function calls inside a loop. These are calls to the math > > functions sin/cos, which I expect would be vectorized if the proper > > simd math lib was available. > > This happens in the loop in line 6932. > > I think there's an open PR for this one (at least for > > powerpc/Altivec?) - need to look/open. > > There is PR22226. > > > > > (3) This one is the most dominant missed optimization: if-conversion > > is failing to if-convert, most likely due to the very limited > > handling of loads/stores (i.e. load/store hoisting/sinking is required). > > This happens in the 13 loops in lines 4085, 4025, 3883, 3818, 3631, > > 355, 3503, 2942, 877, 6740, 6873, 5191, 7943. > > There is on going work towards addressing this issue - see http: > > //gcc.gnu.org/ml/gcc/2007-07/msg00942.html, http://gcc.gnu. > > org/ml/gcc/2007-09/msg00308.html. (I think Victor Kaplansky is > > currently working on this). > > > > (4) A scalar variable, whose address is taken outside the loop (in > > an enclosing outer-loop) is analyzed by the data-references > > analysis, which fails because it is invariant. > > Here's an example: > > for (nl = 1; nl <= i__1; ++nl) > > { > > sum = 0.f; > > for (i__ = 1; i__ <= i__2; ++i__) > > { > > a[i__] = c__[i__] + d__[i__]; > > b[i__] = c__[i__] + e[i__];]; > > sum += a[i__] + b[i__];];]; > > } > > dummy_ (ld, n, &a[1], &b[1], &c__[1], &d__[1], &e[1], &aa [aa_offset],> > > &bb[bb_offset], &cc[cc_offset], &sum); > > } > > (Analysis of 'sum' fails with "FAILED as dr address is invariant". > > This happens in the 2 loops in lines 5053 and 332. > > I think there is a missed optimization PR for this one already. need > > to look/open. > > > > The related PRs are PR33245 and PR33244. Also there is a FIXME > comment in tree-data-ref.c before the failure with "FAILED as dr > address is invariant" error: > > /* FIXME -- data dependence analysis does not work correctly > for objects with > invariant addresses. Let us fail here until the problem isfixed. */ > > > > (5) Reduction and induction that involve multiplication (i.e. 'prod > > *= CST' or 'prod *= a[i]') are currently not supported by the > > vectorizer. It should be trivial to add support for this feature > > (for reduction, it shouldn't be much more than adding a case for > > MULT_EXPR in tree-vectorizer.c:reduction_code_for_scalar_code, I think). > > This happens in the 2 loops in lines 4921 and 4632. > > A missed-optimization PR to be opened. > > Opened PR35226. > > > > > (6) loop distribution is required to break a dependence. This may > > already be handled by Sebastian's loop-distribution pass that will > > be incorporated in 4.4. > > Here is an example: > > for (i__ = 2; i__ <= i__2; ++i__) > > { > > a[i__] += c__[i__] * d__[i__]; > > b[i__] = a[i__] + d__[i__] + b[i__ - 1]; > > } > > This happens in the loop in line 2136. > > Need to check if we need to open a missed optimization PR for this. > > I don't think that this is a loop distribution issue. The dependence > between the store to a[i] and the load from a[i] doesn't prevent > vectorization. The problematic one is between the store to b[i] and > the load from b[i-1] in the second statement. > > > > > (7) A dependence, similar to such that would be created by > > predictive commoning (or even PRE), is present in the loop: > > for (i__ = 1; i__ <= i__2; ++i__) > > { > > a[i__] = (b[i__] + x) * .5f; > > x = b[i__]; > > } > > This happens in the loop in line 3003. > > The vectorizer needs to be extended to handle such cases. > > A missed optimization PR to be opened (if doesn't exist already). > > I opened a new PR - 35229. (PR33244 is somewhat related). > > Ira