I have a loop induction variable question involving post increment. If I have this loop:
void *memcpy_word_ptr(int * __restrict d, int * __restrict s, unsigned int n ) { int i; for(i=0; i<n; i++) {*d++ = *s++; } return d; } and compile it with: -O3 -fno-tree-loop-distribute-patterns, the loop induction variable pass (ivopts) converts this loop: <bb 4>: # d_22 = PHI <d_10(6), d_5(D)(3)> # s_23 = PHI <s_11(6), s_6(D)(3)> # i_24 = PHI <i_14(6), 0(3)> d_10 = d_22 + 4; s_11 = s_23 + 4; _12 = *s_23; *d_22 = _12; i_14 = i_24 + 1; i.2_8 = (unsigned int) i_14; if (i.2_8 < n_9(D)) goto <bb 6>; # bb 6 just loops back to bb 4 else goto <bb 5>; into this loop (using -4 offsets to compensate for incrementing the 's' and 'd' variables before their use: <bb 4>: # d_22 = PHI <d_10(6), d_5(D)(3)> # s_23 = PHI <s_11(6), s_6(D)(3)> # i_24 = PHI <i_14(6), 0(3)> d_10 = d_22 + 4; s_11 = s_23 + 4; _12 = MEM[base: s_11, offset: 4294967292B]; MEM[base: d_10, offset: 4294967292B] = _12; i_14 = i_24 + 1; if (i_14 != _2) goto <bb 6>; # bb 6 just loops back to bb 4 else goto <bb 5>; But if I increment s and d by hand after the copy like this: void *memcpy_word_ptr(int * __restrict d, int * __restrict s, unsigned int n ) { int i; for(i=0; i<n; i++) {*d = *s; d++; s++; } return d; } Then ivopts converts this loop: <bb 4>: # d_22 = PHI <d_12(6), d_5(D)(3)> # s_23 = PHI <s_13(6), s_6(D)(3)> # i_24 = PHI <i_14(6), 0(3)> _10 = *s_23; *d_22 = _10; d_12 = d_22 + 4; s_13 = s_23 + 4; i_14 = i_24 + 1; i.0_8 = (unsigned int) i_14; if (i.0_8 < n_9(D)) goto <bb 6>; # bb 6 just loops back to bb 4 else goto <bb 5>; into this loop (with 0 offsets): <bb 4>: # d_22 = PHI <d_12(6), d_5(D)(3)> # s_23 = PHI <s_13(6), s_6(D)(3)> # i_24 = PHI <i_14(6), 0(3)> _10 = MEM[base: s_23, offset: 0B]; MEM[base: d_22, offset: 0B] = _10; d_12 = d_22 + 4; s_13 = s_23 + 4; i_14 = i_24 + 1; if (i_14 != _2) goto <bb 6>; # bb 6 just loops back to bb 4 else goto <bb 5>; My question is is: why (and where) did ivopts decide to move the post-increments above the usages in the first loop? In my case (MIPS) the second loop generates better code for me then the first loop and I would like to avoid the '-4' offsets that are used. Ideally, one would think that GCC should generate the same code for both of these loops but it does not. Steve Ellcey sell...@mips.com