[EMAIL PROTECTED] wrote on 17/03/2008 19:33:23:

> I have looked more closely at the messages generated by the gcc 4.3
> vectorizer
> and it seems that they fall into two categories:
>
> 1) complaining about aligmnent.
>
> For example:
>
> Unknown alignment for access: D.33485
> Unknown alignment for access: m

These do not necessary mean that the loop can't be vectorized - we can
handle unknown alignment with loop peeling and loop versioning.

>
> I don't understand, as all my data is statically allocated doubles
> (no dynamic
> memory allocation) and I am using -malign-double. What more can I do?
>
> 2) complaining about "possible dependence" between some data and itself
>
> Example:
>
> not vectorized, possible dependence between data-refs
> m.m_storage.m_data[D.43225_112] and m.m_storage.m_data[D.43225_112]

These two data-refs are probably a store and a load to the same place, not
the same data-ref.

As it has been already said, the best thing to do is to open a PR with a
testcase, so we can fully analyze it and answer all the questions..

Ira

>
>
> I am wondering what to do about all that? Surely there must be
documentation
> about the vectorizer and its messages somewhere but I can't find it?
>
> Cheers,
> Benoit
>
>
> On Monday 17 March 2008 15:59:21 Richard Guenther wrote:
> > On Mon, Mar 17, 2008 at 3:45 PM, Benoît Jacob <[EMAIL PROTECTED]>
wrote:
> > > Dear All,
> > >
> > >  I am currently (co-)developing a Free (GPL/LGPL) C++ library for
> > > vector/matrix math.
> > >
> > >  A major decision that we need to take is, what to do regarding
> > > vectorization instructions (SSE). Either we rely on GCC to
> > > auto-vectorize, or we control explicitly the vectorization using
GCC's
> > > special primitives. The latter solution is of course more difficult,
and
> > > would to some degree obfuscate our source code, so we wish to know
> > > whether or not it's really necessary.
> > >
> > >  GCC 4.3.0 does auto-vectorize our loops, but the resulting code has
> > > worse performance than a version with unrolled loops and no
> > > vectorization. By contrast, ICC auto-vectorizes the same loops in a
way
> > > that makes them significantly faster than the unrolled-loops
> > > non-vectorized version.
> > >
> > >  If you want to know, the loops in question typically look like:
> > >  for(int i = 0; i < COMPILE_TIME_CONSTANT; i++)
> > >  {
> > >         // some abstract c++ code with deep recursive templates and
> > >         // deep recursive inline functions, but resulting in only a
> > >         // few assembly instructions
> > >         a().b().c().d(i) = x().y().z(i);
> > >  }
> > >
> > >  As said above, it's crucial for us to be able to get an idea of what
to
> > >  expect, because design decisions depend on that. Should we expect
large
> > >  improvements regarding autovectorization in 4.3.x, in 4.4 or 4.5 ?
> >
> > In general GCCs autovectorization capabilities are quite good, cases
> > where we miss opportunities do of course exist.  There were
improvements
> > regarding autovectorization capabilities in every GCC release and I
expect
> > that to continue for future releases (though I cannot promise anything
> > as GCC is a volunteer driven project - but certainly testcases where we
> > miss optimizations are welcome - often we don't know of all corner
cases).
> >
> > If you require to get the absolute most out of your CPU I recommend to
> > provide special routines tuned for the different CPU families and I
> > recommend the use of the standard intrinsics headers (*mmintr.h) for
> > this.  Of course this comes at a high cost of maintainance (and initial
> > work), so autovectorization might prove good enough.  Often tuning the
> > source for a given compiler has a similar effect than producing
vectorized
> > code manually.  Looking at GCC tree dumps and knowing a bit about
> > GCC internals helps you here ;)
> >
> > >  A roadmap or a GCC developer sharing his thoughts would be very
helpful.
> >
> > Thanks,
> > Richard.
>
>
> [attachment "signature.asc" deleted by Ira Rosen/Haifa/IBM]

Reply via email to