In the testcase below, after the inner-loop gets completely unrolled, the enclosing i-loop does not get unrolled because of failure to analyze the loop iv, possibly due to a bug in df:
#define N 40 #define M 10 float in[N+M], coeff[M], out[N]; void fir (){ int i,j,k; float diff; for (i = 0; i < N; i++) { diff = 0; for (j = 0; j < M; j++) { diff += in[j+i]*coeff[j]; } out[i] = diff; } } Compiler options used: /Develop/mainline-dn1/bin/gcc -O3 -maltivec -funroll-loops vect-outer-fir2-kernel.c -S --param max-completely-peeled-insns=5000 --param max-completely-peel-times=40 -fdump-tree-all -da -ftree-vectorize (without -ftree-vectorize the i-loop does get unrolled). Detailed description and discussion here: http://gcc.gnu.org/ml/gcc/2007-08/msg00482.html Here are the relevant pieces from the RTL dump (at loop3_unroll): bb2: (insn 40 39 41 2 vect-outer-fir2-kernel.c:38 (set (reg:DI 187 [ ivtmp.59 ]) (mem/u/c:DI (plus:DI (reg:DI 2 2) (const:DI (minus:DI (symbol_ref/u:DI ("*.LC4") [flags 0x2]) (symbol_ref:DI ("*.LCTOC1"))))) [7 S8 A8])) 344 {*movdi_internal64} (expr_list:REG_EQUAL (symbol_ref:DI ("fir_out") [flags 0x80] <var_decl 0xf7d571c0 fir_out>) (nil))) ... (insn 289 288 68 2 (set (reg/f:DI 319) (plus:DI (reg:DI 187 [ ivtmp.59 ]) (const_int 160 [0xa0]))) 80 {*adddi3_internal1} (expr_list:REG_DEAD (reg:DI 2 2) (expr_list:REG_EQUAL (const:DI (plus:DI (symbol_ref:DI ("fir_out") [flags 0x80] <var_decl 0xf7d571c0 fir_out>) (const_int 160 [0xa0]))) (nil)))) ... loop: bb3 (loop-header): ... (insn 255 254 256 3 vect-outer-fir2-kernel.c:47 (set (reg:DI 187 [ ivtmp.59 ]) (plus:DI (reg:DI 187 [ ivtmp.59 ]) (const_int 16 [0x10]))) 80 {*adddi3_internal1} (nil)) ... (insn 265 263 266 3 vect-outer-fir2-kernel.c:47 (set (reg:CC 316) (compare:CC (reg:DI 187 [ ivtmp.59 ]) (reg/f:DI 319))) 459 {*cmpdi_internal1} (expr_list:REG_EQUAL (compare:CC (reg:DI 187 [ ivtmp.59 ]) (const:DI (plus:DI (symbol_ref:DI ("fir_out") [flags 0x80] <var_decl 0xf7d571c0 fir_out>) (const_int 160 [0xa0])))) (nil))) Below is the output of df_ref_debug for adef in each iteration of the loop in latch_dominating_def: d40 reg 187 bb 3 insn 255 flag 0x0 type 0x0 loc 0xf7da4608(0xf7d9a4e0) chain { } d93 reg 187 bb 2 insn 40 flag 0x0 type 0x0 loc 0xf7d89cc8(0xf7d9a4e0) chain { } For both the bitmap is set. -- Summary: failing rtl iv analysis (maybe due to df) Product: gcc Version: 4.3.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: rtl-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dorit at gcc dot gnu dot org GCC build triplet: powerpc64-linux GCC host triplet: powerpc64-linux GCC target triplet: powerpc64-linux http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33224