Hello, I tried to use doloop_end pattern to reduce loop overhead for our target processor, which features a dedicated loop instruction. Somehow even a simple loop just cannot pass the test of doloop_condition_get, which requires following canonical pattern.
/* The canonical doloop pattern we expect has one of the following forms: 1) (parallel [(set (pc) (if_then_else (condition) (label_ref (label)) (pc))) (set (reg) (plus (reg) (const_int -1))) (additional clobbers and uses)]) The branch must be the first entry of the parallel (also required by jump.c), and the second entry of the parallel must be a set of the loop counter register. Some targets (IA-64) wrap the set of the loop counter in an if_then_else too. 2) (set (reg) (plus (reg) (const_int -1)) (set (pc) (if_then_else (reg != 0) (label_ref (label)) (pc))). */ Here is a simple function I used, it should meet all doloop optimization requirements. void Unroll( short s, int * restrict b_inout, int *restrict out, int N) { int i; for (i=0; i<64; i++) { out[i] = b_inout[i] + s; } } In tree ivcanon pass, it is converted to ;; Function Unroll (Unroll) Unroll (short int s, int * restrict b_inout, int * restrict out, int N) { unsigned int ivtmp.14; int pretmp.9; long unsigned int pretmp.8; int storetmp.6; int i; int D.1459; int D.1458; int D.1457; int * D.1456; int * D.1455; long unsigned int D.1454; long unsigned int D.1453; <bb 2>: pretmp.9_8 = (int) s_12(D); <bb 3>: # ivtmp.14_13 = PHI <ivtmp.14_21(4), 64(2)> # i_19 = PHI <i_15(4), 0(2)> D.1453_3 = (long unsigned int) i_19; D.1454_4 = D.1453_3 * 4; D.1455_6 = out_5(D) + D.1454_4; D.1456_10 = b_inout_9(D) + D.1454_4; D.1457_11 = *D.1456_10; D.1459_14 = pretmp.9_8 + D.1457_11; *D.1455_6 = D.1459_14; i_15 = i_19 + 1; ivtmp.14_21 = ivtmp.14_13 - 1; if (ivtmp.14_21 != 0) goto <bb 4>; else goto <bb 5>; <bb 4>: goto <bb 3>; <bb 5>: return; } This should match requirements of doloop_condition_get. But after ivopts pass, the code is transformed to: ;; Function Unroll (Unroll) Unroll (short int s, int * restrict b_inout, int * restrict out, int N) { long unsigned int ivtmp.21; unsigned int ivtmp.14; int pretmp.9; long unsigned int pretmp.8; int storetmp.6; int i; int D.1459; int D.1458; int D.1457; int * D.1456; int * D.1455; long unsigned int D.1454; long unsigned int D.1453; <bb 2>: pretmp.9_8 = (int) s_12(D); <bb 3>: # ivtmp.21_7 = PHI <ivtmp.21_16(4), 0(2)> D.1457_11 = MEM[base: b_inout_9(D), index: ivtmp.21_7]; D.1459_14 = pretmp.9_8 + D.1457_11; MEM[base: out_5(D), index: ivtmp.21_7] = D.1459_14; ivtmp.21_16 = ivtmp.21_7 + 4; if (ivtmp.21_16 != 256) goto <bb 4>; else goto <bb 5>; <bb 4>: goto <bb 3>; <bb 5>: return; } It is not required canonical form anymore. And later RTL level optimizations cannot convert it back. Since it doesn't pass the doloop_condition_get test, modulo scheduling pass doesn't work too. Do I miss something here? Any hint is greatly appreciated. Cheers, Bingfeng Mei