On Wed, Jan 8, 2014 at 3:09 PM, Paulo Matos <pma...@broadcom.com> wrote: >> -----Original Message----- >> From: Richard Biener [mailto:richard.guent...@gmail.com] >> Sent: 08 January 2014 11:03 >> To: Paulo Matos >> Cc: Andrew Haley; gcc@gcc.gnu.org >> Subject: Re: Infinite number of iterations in loop [v850, mep] >> >> That was refering to the case with extern b. For the above case the >> issue must be sth else. Trying a cross to v850-elf to see if it >> reproduces for me (if 'b' is a stack or argument slot then we might >> bogously think that *c++ = 0 may clobber it, otherwise RTL >> number of iteration analysis might just be confused). >> >> So for example (should be arch independent) >> >> struct X { int i; int j; int k; int l[24]; }; >> >> int foo (struct X x, int *p) >> { >> int z = x.j; >> *p = 1; >> return z; >> } >> >> see if there is a anti-dependence between x.j and *p on the RTL side >> (at least the code dispatching to the tree oracle using the MEM_EXPRs >> should save you from that). >> >> So - v850 at least doesn't pass b in memory and the doloop recognition >> works for me (on trunk). >> > > You are right, everything is fine with the above example regarding the > anti-dependence and with the loop as well. I got confused with mine not > generating a loop for > void fn1 (unsigned int b) > { > unsigned int a; > for (a = 0; a < b; a++) > *c++ = 0; > } > > but that simply because in our case it is not profitable. > > However, for the case: > void matrix_add_const(unsigned int N, short *A, short val) { > unsigned int i,j; > for (i=0; i<N; i++) { > for (j=0; j<N; j++) { > A[i*N+j] += val; > } > } > } > > GCC thinks for v850 and my port that the inner loop might be infinite. > It looks like GCC is mangling the loop so much that the obviousness that the > inner loop is finite is lost. > > This however turns out to be very performance degrading. Using -fno-ivopts > makes generation of loops work again both in my port and v850. > Is there a way to fine-tune ivopts besides trying to tune the costs or do you > reckon this is something iv-analysis should be smarter about?
Well. We have Loop 2 is simple: simple exit 5 -> 7 infinite if: (expr_list:REG_DEP_TRUE (and:SI (reg:SI 76) (const_int 1 [0x1])) (nil)) number of iterations: (lshiftrt:SI (plus:SI (minus:SI (reg:SI 68 [ D.1398 ]) (reg:SI 64 [ ivtmp___6 ])) (const_int -2 [0xfffffffffffffffe])) (const_int 1 [0x1])) upper bound: 2147483646 realistic bound: -1 Doloop: Possible infinite iteration case. Doloop: The loop is not suitable. as we replaced the induction variable by a pointer induction with step 2. So this might be a very common issue for RTL loop opts, the upper bound of the IV is 2 * N in this case, so 2 * N & 1 should be always false and thus "infinite" be optimized. (insn 34 33 36 3 (parallel [ (set (reg:SI 76) (plus:SI (reg/v:SI 71 [ N ]) (reg/v:SI 71 [ N ]))) (clobber (reg:CC 32 psw)) ]) 21 {addsi3} (expr_list:REG_UNUSED (reg:CC 32 psw) (nil))) that doesn't look too difficult to do with the above definition. nonzero_bits might be of use here, not sure (not my area of expertise). Richard. > Paulo Matos > >> Richard. >> >> > The same situation occurs with a coremark function: >> > void matrix_add_const(ee_u32 N, MATDAT *A, MATDAT val) { >> > ee_u32 i,j; >> > for (i=0; i<N; i++) { >> > for (j=0; j<N; j++) { >> > A[i*N+j] += val; >> > } >> > } >> > } >> > >> > It also says the inner loop might be infinite but it can't N is given as >> argument. j starts as 0, N is unsigned like N. This will loop N times. GCC >> cannot >> possibly assume array A will overwrite the value of N in the loop. This seems >> like something is wrong in alias analysis. >> > >> >> -- >> >> PMatos