On Wednesday 13 December 2006 12:44, Benoît Jacob wrote:
> I'm developing a Free C++ template library (1) in which it is very important 
> that certain loops get unrolled, but at the same time I can't unroll them by 
> hand, because they depend on template parameters.
> My problem is that G++ 4.1.1 (Gentoo) doesn't unroll these loops.
> I have written a standalone simple program showing this problem; I attach it 
> (toto.cpp) and I also paste it below. This program does a loop if UNROLL is 
> not defined, and does the same thing but with the loop unrolled by hand if 
> UNROLL is defined. So one would expect that with g++ -O3, the speed would be 
> the same in both cases. Alas, it's not:
> g++ -DUNROLL -O3 toto.cpp -o toto   ---> toto runs in 0.3 seconds
> g++ -O3 toto.cpp -o toto            ---> toto runs in 1.9 seconds
> So what can I do? Is that a bug in g++?

C++ doesn't specify that compiler shall unroll loops, so it cannot be
classified as "real" bug.

# g++ -c -O3 toto.cpp -o toto.o
# g++ -DUNROLL -O3 toto.cpp -o toto_unroll.o -c
# size toto.o toto_unroll.o
   text    data     bss     dec     hex filename
    525       8       1     534     216 toto.o
    359       8       1     368     170 toto_unroll.o

How can C++ compiler know that you are willing to trade
so much of text size for performance?

I usually find myself on opposite side: I use -Os but gcc
still eats more space in the name of speed in certain

Re code: I would use memset + just a single, non-nested for()
loop anyway... you C++ people tend to overtax compiler with
optimizations. Is it really necessary to do (i == j) * factor
when (i == j) ? factor : 0 is easier for compiler to grok?

> If yes, any hope to see it fixed soon? 
> Cheers,
> Benoit
> (1) : Eigen, see http://eigen.tuxfamily.org

"Eigen is a lightweight C++ template library for vector and matrix
math, a.k.a. linear algebra."

Template lib for vector and matrix math sounds like a performance
disaster in the making, at least for me. However, maybe you are
truly smart guy and can do miracles.


