On Mon, Mar 17, 2008 at 3:45 PM, Benoît Jacob <[EMAIL PROTECTED]> wrote: > Dear All, > > I am currently (co-)developing a Free (GPL/LGPL) C++ library for > vector/matrix > math. > > A major decision that we need to take is, what to do regarding vectorization > instructions (SSE). Either we rely on GCC to auto-vectorize, or we control > explicitly the vectorization using GCC's special primitives. The latter > solution is of course more difficult, and would to some degree obfuscate our > source code, so we wish to know whether or not it's really necessary. > > GCC 4.3.0 does auto-vectorize our loops, but the resulting code has worse > performance than a version with unrolled loops and no vectorization. By > contrast, ICC auto-vectorizes the same loops in a way that makes them > significantly faster than the unrolled-loops non-vectorized version. > > If you want to know, the loops in question typically look like: > for(int i = 0; i < COMPILE_TIME_CONSTANT; i++) > { > // some abstract c++ code with deep recursive templates and > // deep recursive inline functions, but resulting in only a > // few assembly instructions > a().b().c().d(i) = x().y().z(i); > } > > As said above, it's crucial for us to be able to get an idea of what to > expect, because design decisions depend on that. Should we expect large > improvements regarding autovectorization in 4.3.x, in 4.4 or 4.5 ?
In general GCCs autovectorization capabilities are quite good, cases where we miss opportunities do of course exist. There were improvements regarding autovectorization capabilities in every GCC release and I expect that to continue for future releases (though I cannot promise anything as GCC is a volunteer driven project - but certainly testcases where we miss optimizations are welcome - often we don't know of all corner cases). If you require to get the absolute most out of your CPU I recommend to provide special routines tuned for the different CPU families and I recommend the use of the standard intrinsics headers (*mmintr.h) for this. Of course this comes at a high cost of maintainance (and initial work), so autovectorization might prove good enough. Often tuning the source for a given compiler has a similar effect than producing vectorized code manually. Looking at GCC tree dumps and knowing a bit about GCC internals helps you here ;) > A roadmap or a GCC developer sharing his thoughts would be very helpful. Thanks, Richard.