I have looked more closely at the messages generated by the gcc 4.3 vectorizer and it seems that they fall into two categories:
1) complaining about aligmnent. For example: Unknown alignment for access: D.33485 Unknown alignment for access: m I don't understand, as all my data is statically allocated doubles (no dynamic memory allocation) and I am using -malign-double. What more can I do? 2) complaining about "possible dependence" between some data and itself Example: not vectorized, possible dependence between data-refs m.m_storage.m_data[D.43225_112] and m.m_storage.m_data[D.43225_112] I am wondering what to do about all that? Surely there must be documentation about the vectorizer and its messages somewhere but I can't find it? Cheers, Benoit On Monday 17 March 2008 15:59:21 Richard Guenther wrote: > On Mon, Mar 17, 2008 at 3:45 PM, Benoît Jacob <[EMAIL PROTECTED]> wrote: > > Dear All, > > > > I am currently (co-)developing a Free (GPL/LGPL) C++ library for > > vector/matrix math. > > > > A major decision that we need to take is, what to do regarding > > vectorization instructions (SSE). Either we rely on GCC to > > auto-vectorize, or we control explicitly the vectorization using GCC's > > special primitives. The latter solution is of course more difficult, and > > would to some degree obfuscate our source code, so we wish to know > > whether or not it's really necessary. > > > > GCC 4.3.0 does auto-vectorize our loops, but the resulting code has > > worse performance than a version with unrolled loops and no > > vectorization. By contrast, ICC auto-vectorizes the same loops in a way > > that makes them significantly faster than the unrolled-loops > > non-vectorized version. > > > > If you want to know, the loops in question typically look like: > > for(int i = 0; i < COMPILE_TIME_CONSTANT; i++) > > { > > // some abstract c++ code with deep recursive templates and > > // deep recursive inline functions, but resulting in only a > > // few assembly instructions > > a().b().c().d(i) = x().y().z(i); > > } > > > > As said above, it's crucial for us to be able to get an idea of what to > > expect, because design decisions depend on that. Should we expect large > > improvements regarding autovectorization in 4.3.x, in 4.4 or 4.5 ? > > In general GCCs autovectorization capabilities are quite good, cases > where we miss opportunities do of course exist. There were improvements > regarding autovectorization capabilities in every GCC release and I expect > that to continue for future releases (though I cannot promise anything > as GCC is a volunteer driven project - but certainly testcases where we > miss optimizations are welcome - often we don't know of all corner cases). > > If you require to get the absolute most out of your CPU I recommend to > provide special routines tuned for the different CPU families and I > recommend the use of the standard intrinsics headers (*mmintr.h) for > this. Of course this comes at a high cost of maintainance (and initial > work), so autovectorization might prove good enough. Often tuning the > source for a given compiler has a similar effect than producing vectorized > code manually. Looking at GCC tree dumps and knowing a bit about > GCC internals helps you here ;) > > > A roadmap or a GCC developer sharing his thoughts would be very helpful. > > Thanks, > Richard.
signature.asc
Description: This is a digitally signed message part.