http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54717
--- Comment #16 from Jan Hubicka <hubicka at gcc dot gnu.org> 2012-11-15 10:52:13 UTC --- OK, 4.7 vectorize two loops in the function in cptrf2 loop at ../a.f90:3538 if (nxtr < 4) then kerr = 1 do ixtr = 1, nxtr - 1 ixtrt (ixtr) = ixtr + 1 enddo goto 9000 endif and loop at ../a.f90:3530 ixtrt = 0 The second loop is recognized as memset by mainline, so it remains to figure out what is wrong with the first loop. It is unrolled: Analyzing # of iterations of loop 9 exit condition [1, + , 1](no_overflow) != ival2_27 + -1 bounds on difference of bases: 0 ... 1 result: # of iterations (unsigned int) ival2_27 + 4294967294, bounded by 1 Loop 9 iterates at most 1 times. Estimating sizes for loop 9 BB: 8, after_exit: 0 size: 0 _38 = (integer(kind=8)) ixtr_12; Induction variable computation will be folded away. size: 1 _39 = _38 + -1; Induction variable computation will be folded away. size: 1 ixtr_40 = ixtr_12 + 1; Induction variable computation will be folded away. size: 1 *ixtrt_33(D)[_39] = ixtr_40; size: 2 if (ixtr_12 == _37) Exit condition will be eliminated in last copy. BB: 79, after_exit: 1 size: 5-2, last_iteration: 5-4 Loop size: 5 Estimated size after unrolling: 2 Unrolled loop 9 completely (duplicated 1 times). I do not quite see why it iterates at most once, but if seems to work. So I would say that it is good idea to unroll rather than vectorize. Is the slowdown still reproducing with my patch?