> Tree level unrollers (cunrolli and cunroll) do complete unroll. At O2, > both of them are turned on, but gcc does not allow any code growth -- > which makes them pretty useless at O2 (very few loops qualify). The > default max complete peel iteration is also too low compared with both > icc and llvm. This needs to be tuned.
I found that at -O3 (where tree unroll is on by default) there is quite a bit of useless unrolling. I got somewhat irritated that my printf debug loops were commonly unrolled. -Andi