[EMAIL PROTECTED] wrote on 17/03/2008 19:33:23: > I have looked more closely at the messages generated by the gcc 4.3 > vectorizer > and it seems that they fall into two categories: > > 1) complaining about aligmnent. > > For example: > > Unknown alignment for access: D.33485 > Unknown alignment for access: m
These do not necessary mean that the loop can't be vectorized - we can handle unknown alignment with loop peeling and loop versioning. > > I don't understand, as all my data is statically allocated doubles > (no dynamic > memory allocation) and I am using -malign-double. What more can I do? > > 2) complaining about "possible dependence" between some data and itself > > Example: > > not vectorized, possible dependence between data-refs > m.m_storage.m_data[D.43225_112] and m.m_storage.m_data[D.43225_112] These two data-refs are probably a store and a load to the same place, not the same data-ref. As it has been already said, the best thing to do is to open a PR with a testcase, so we can fully analyze it and answer all the questions.. Ira > > > I am wondering what to do about all that? Surely there must be documentation > about the vectorizer and its messages somewhere but I can't find it? > > Cheers, > Benoit > > > On Monday 17 March 2008 15:59:21 Richard Guenther wrote: > > On Mon, Mar 17, 2008 at 3:45 PM, Benoît Jacob <[EMAIL PROTECTED]> wrote: > > > Dear All, > > > > > > I am currently (co-)developing a Free (GPL/LGPL) C++ library for > > > vector/matrix math. > > > > > > A major decision that we need to take is, what to do regarding > > > vectorization instructions (SSE). Either we rely on GCC to > > > auto-vectorize, or we control explicitly the vectorization using GCC's > > > special primitives. The latter solution is of course more difficult, and > > > would to some degree obfuscate our source code, so we wish to know > > > whether or not it's really necessary. > > > > > > GCC 4.3.0 does auto-vectorize our loops, but the resulting code has > > > worse performance than a version with unrolled loops and no > > > vectorization. By contrast, ICC auto-vectorizes the same loops in a way > > > that makes them significantly faster than the unrolled-loops > > > non-vectorized version. > > > > > > If you want to know, the loops in question typically look like: > > > for(int i = 0; i < COMPILE_TIME_CONSTANT; i++) > > > { > > > // some abstract c++ code with deep recursive templates and > > > // deep recursive inline functions, but resulting in only a > > > // few assembly instructions > > > a().b().c().d(i) = x().y().z(i); > > > } > > > > > > As said above, it's crucial for us to be able to get an idea of what to > > > expect, because design decisions depend on that. Should we expect large > > > improvements regarding autovectorization in 4.3.x, in 4.4 or 4.5 ? > > > > In general GCCs autovectorization capabilities are quite good, cases > > where we miss opportunities do of course exist. There were improvements > > regarding autovectorization capabilities in every GCC release and I expect > > that to continue for future releases (though I cannot promise anything > > as GCC is a volunteer driven project - but certainly testcases where we > > miss optimizations are welcome - often we don't know of all corner cases). > > > > If you require to get the absolute most out of your CPU I recommend to > > provide special routines tuned for the different CPU families and I > > recommend the use of the standard intrinsics headers (*mmintr.h) for > > this. Of course this comes at a high cost of maintainance (and initial > > work), so autovectorization might prove good enough. Often tuning the > > source for a given compiler has a similar effect than producing vectorized > > code manually. Looking at GCC tree dumps and knowing a bit about > > GCC internals helps you here ;) > > > > > A roadmap or a GCC developer sharing his thoughts would be very helpful. > > > > Thanks, > > Richard. > > > [attachment "signature.asc" deleted by Ira Rosen/Haifa/IBM]