Dear All,

I am currently (co-)developing a Free (GPL/LGPL) C++ library for vector/matrix 
math.

A major decision that we need to take is, what to do regarding vectorization 
instructions (SSE). Either we rely on GCC to auto-vectorize, or we control 
explicitly the vectorization using GCC's special primitives. The latter 
solution is of course more difficult, and would to some degree obfuscate our 
source code, so we wish to know whether or not it's really necessary.

GCC 4.3.0 does auto-vectorize our loops, but the resulting code has worse 
performance than a version with unrolled loops and no vectorization. By 
contrast, ICC auto-vectorizes the same loops in a way that makes them 
significantly faster than the unrolled-loops non-vectorized version.

If you want to know, the loops in question typically look like:
for(int i = 0; i < COMPILE_TIME_CONSTANT; i++)
{
        // some abstract c++ code with deep recursive templates and
        // deep recursive inline functions, but resulting in only a
        // few assembly instructions
        a().b().c().d(i) = x().y().z(i);
}

As said above, it's crucial for us to be able to get an idea of what to 
expect, because design decisions depend on that. Should we expect large 
improvements regarding autovectorization in 4.3.x, in 4.4 or 4.5 ?

A roadmap or a GCC developer sharing his thoughts would be very helpful.

Cheers,

Benoit

P.S. I have noticed huge improvements in GCC recently and would like to thank 
all the developers for that. This is what makes me hope that GCC might soon 
handle auto-vectorization in a way that allows me to rely on it!

Attachment: signature.asc
Description: This is a digitally signed message part.

Reply via email to