Thanks a lot for tracking down / opening the relevant PRs.

about:

> > (6) loop distribution is required to break a dependence. This may
> > already be handled by Sebastian's loop-distribution pass that will
> > be incorporated in 4.4.
> > Here is an example:
> >  for (i__ = 2; i__ <= i__2; ++i__)
> >         {
> >           a[i__] += c__[i__] * d__[i__];
> >           b[i__] = a[i__] + d__[i__] + b[i__ - 1];
> >         }
> > This happens in the loop in line 2136.
> > Need to check if we need to open a missed optimization PR for this.>
>
> I don't think that this is a loop distribution issue. The dependence
> between the store to a[i] and the load from a[i] doesn't prevent
> vectorization.

right,

> The problematic one is between the store to b[i] and
> the load from b[i-1] in the second statement.

...which is exactly why loop distribution could make this loop (partially)
vectorizable - separating the first and second statements into separate
loops would allow vectorizing the first of the two resulting loops (which
is probably what icc does - icc reports that this loop is partially
vectrizable).

dorit


Ira Rosen/Haifa/IBM wrote on 17/02/2008 11:42:55:

> Hi,
>
> Dorit Nuzman/Haifa/IBM wrote on 14/02/2008 17:02:45:
>
> > This is an old debt: A while back Tim had sent me a detailed report
> > off line showing which C++ tests (originally from the Dongara loops
> > suite) were vectorized by current g++ or icpc, or both, as well as
> > when the vectorization by icpc required a pragma, or was partial. I
> > went over the loops that were reported to be vectorized by icc but
> > not by gcc, to see which features we are missing. There are 23 such
> > loops (out of a total of 77). They fall into the following 7
categories:
> >
> > (1) scalar evolution analysis fails with "evolution of base is
notaffine".
> > This happens in the 3 loops in lines 4267, 4204 and 511.
> > Here an example:
> >  for (i__ = 1; i__ <= i__2; ++i__)
> >         {
> >           a[i__] = (b[i__] + b[im1] + b[im2]) * .333f;
> >           im2 = im1;
> >           im1 = i__;
> >         }
> > Missed optimization PR to be opened.
>
> I opened PR35224.
>
> >
> > (2) Function calls inside a loop. These are calls to the math
> > functions sin/cos, which I expect would be vectorized if the proper
> > simd math lib was available.
> > This happens in the loop in line 6932.
> > I think there's an open PR for this one (at least for
> > powerpc/Altivec?) - need to look/open.
>
> There is PR22226.
>
> >
> > (3) This one is the most dominant missed optimization: if-conversion
> > is failing to if-convert, most likely due to the very limited
> > handling of loads/stores (i.e. load/store hoisting/sinking is
required).
> > This happens in the 13 loops in lines 4085, 4025, 3883, 3818, 3631,
> > 355, 3503, 2942, 877, 6740, 6873, 5191, 7943.
> > There is on going work towards addressing this issue - see http:
> > //gcc.gnu.org/ml/gcc/2007-07/msg00942.html, http://gcc.gnu.
> > org/ml/gcc/2007-09/msg00308.html. (I think Victor Kaplansky is
> > currently working on this).
> >
> > (4) A scalar variable, whose address is taken outside the loop (in
> > an enclosing outer-loop) is analyzed by the data-references
> > analysis, which fails because it is invariant.
> > Here's an example:
> >   for (nl = 1; nl <= i__1; ++nl)
> >     {
> >       sum = 0.f;
> >       for (i__ = 1; i__ <= i__2; ++i__)
> >         {
> >           a[i__] = c__[i__] + d__[i__];
> >           b[i__] = c__[i__] + e[i__];];
> >             sum += a[i__] + b[i__];];];
> >         }
> >       dummy_ (ld, n, &a[1], &b[1], &c__[1], &d__[1], &e[1], &aa
[aa_offset],>
> >               &bb[bb_offset], &cc[cc_offset], &sum);
> >     }
> > (Analysis of 'sum' fails with "FAILED as dr address is invariant".
> > This happens in the 2 loops in lines 5053 and 332.
> > I think there is a missed optimization PR for this one already. need
> > to look/open.
> >
>
> The related PRs are PR33245 and PR33244. Also there is a FIXME
> comment in tree-data-ref.c before the failure with "FAILED as dr
> address is invariant" error:
>
>       /* FIXME -- data dependence analysis does not work correctly
> for objects with
>          invariant addresses.  Let us fail here until the problem
isfixed.  */
>
>
> > (5) Reduction and induction that involve multiplication (i.e. 'prod
> > *= CST' or 'prod *= a[i]') are currently not supported by the
> > vectorizer. It should be trivial to add support for this feature
> > (for reduction, it shouldn't be much more than adding a case for
> > MULT_EXPR in tree-vectorizer.c:reduction_code_for_scalar_code, I
think).
> > This happens in the 2 loops in lines 4921 and 4632.
> > A missed-optimization PR to be opened.
>
> Opened PR35226.
>
> >
> > (6) loop distribution is required to break a dependence. This may
> > already be handled by Sebastian's loop-distribution pass that will
> > be incorporated in 4.4.
> > Here is an example:
> >  for (i__ = 2; i__ <= i__2; ++i__)
> >         {
> >           a[i__] += c__[i__] * d__[i__];
> >           b[i__] = a[i__] + d__[i__] + b[i__ - 1];
> >         }
> > This happens in the loop in line 2136.
> > Need to check if we need to open a missed optimization PR for this.
>
> I don't think that this is a loop distribution issue. The dependence
> between the store to a[i] and the load from a[i] doesn't prevent
> vectorization. The problematic one is between the store to b[i] and
> the load from b[i-1] in the second statement.
>
> >
> > (7) A dependence, similar to such that would be created by
> > predictive commoning (or even PRE), is present in the loop:
> >  for (i__ = 1; i__ <= i__2; ++i__)
> >         {
> >           a[i__] = (b[i__] + x) * .5f;
> >           x = b[i__];
> >         }
> > This happens in the loop in line 3003.
> > The vectorizer needs to be extended to handle such cases.
> > A missed optimization PR to be opened (if doesn't exist already).
>
> I opened a new PR - 35229. (PR33244 is somewhat related).
>
> Ira

Reply via email to