On Mon, Mar 17, 2008 at 03:45:49PM +0100, Benoît Jacob wrote:
> Dear All,
> 
> I am currently (co-)developing a Free (GPL/LGPL) C++ library for 
> vector/matrix 
> math.
> 
> A major decision that we need to take is, what to do regarding vectorization 
> instructions (SSE). Either we rely on GCC to auto-vectorize, or we control 
> explicitly the vectorization using GCC's special primitives. The latter 
> solution is of course more difficult, and would to some degree obfuscate our 
> source code, so we wish to know whether or not it's really necessary.
> 
> GCC 4.3.0 does auto-vectorize our loops, but the resulting code has worse 
> performance than a version with unrolled loops and no vectorization. By 
> contrast, ICC auto-vectorizes the same loops in a way that makes them 
> significantly faster than the unrolled-loops non-vectorized version.
> 
> If you want to know, the loops in question typically look like:
> for(int i = 0; i < COMPILE_TIME_CONSTANT; i++)
> {
>       // some abstract c++ code with deep recursive templates and
>         // deep recursive inline functions, but resulting in only a
>         // few assembly instructions
>       a().b().c().d(i) = x().y().z(i);
> }

Are they for 64bit or 32bit targets?  Are a/b/c/d/x/y/z arrays on
stack? I suggest you open a bug report so that gcc vectorizer 
people can take a look.


H.J.

Reply via email to