Dear All, I am currently (co-)developing a Free (GPL/LGPL) C++ library for vector/matrix math.
A major decision that we need to take is, what to do regarding vectorization instructions (SSE). Either we rely on GCC to auto-vectorize, or we control explicitly the vectorization using GCC's special primitives. The latter solution is of course more difficult, and would to some degree obfuscate our source code, so we wish to know whether or not it's really necessary. GCC 4.3.0 does auto-vectorize our loops, but the resulting code has worse performance than a version with unrolled loops and no vectorization. By contrast, ICC auto-vectorizes the same loops in a way that makes them significantly faster than the unrolled-loops non-vectorized version. If you want to know, the loops in question typically look like: for(int i = 0; i < COMPILE_TIME_CONSTANT; i++) { // some abstract c++ code with deep recursive templates and // deep recursive inline functions, but resulting in only a // few assembly instructions a().b().c().d(i) = x().y().z(i); } As said above, it's crucial for us to be able to get an idea of what to expect, because design decisions depend on that. Should we expect large improvements regarding autovectorization in 4.3.x, in 4.4 or 4.5 ? A roadmap or a GCC developer sharing his thoughts would be very helpful. Cheers, Benoit P.S. I have noticed huge improvements in GCC recently and would like to thank all the developers for that. This is what makes me hope that GCC might soon handle auto-vectorization in a way that allows me to rely on it!
signature.asc
Description: This is a digitally signed message part.