> -----Original Message----- > From: Richard Biener [mailto:richard.guent...@gmail.com] > Sent: 08 January 2014 14:42 > To: Paulo Matos > Cc: Andrew Haley; gcc@gcc.gnu.org; Jan Hubicka > Subject: Re: Infinite number of iterations in loop [v850, mep] > > On Wed, Jan 8, 2014 at 3:09 PM, Paulo Matos <pma...@broadcom.com> wrote: > >> -----Original Message----- > >> From: Richard Biener [mailto:richard.guent...@gmail.com] > >> Sent: 08 January 2014 11:03 > >> To: Paulo Matos > >> Cc: Andrew Haley; gcc@gcc.gnu.org > >> Subject: Re: Infinite number of iterations in loop [v850, mep] > >> > >> That was refering to the case with extern b. For the above case the > >> issue must be sth else. Trying a cross to v850-elf to see if it > >> reproduces for me (if 'b' is a stack or argument slot then we might > >> bogously think that *c++ = 0 may clobber it, otherwise RTL > >> number of iteration analysis might just be confused). > >> > >> So for example (should be arch independent) > >> > >> struct X { int i; int j; int k; int l[24]; }; > >> > >> int foo (struct X x, int *p) > >> { > >> int z = x.j; > >> *p = 1; > >> return z; > >> } > >> > >> see if there is a anti-dependence between x.j and *p on the RTL side > >> (at least the code dispatching to the tree oracle using the MEM_EXPRs > >> should save you from that). > >> > >> So - v850 at least doesn't pass b in memory and the doloop recognition > >> works for me (on trunk). > >> > > > > You are right, everything is fine with the above example regarding the anti- > dependence and with the loop as well. I got confused with mine not generating > a > loop for > > void fn1 (unsigned int b) > > { > > unsigned int a; > > for (a = 0; a < b; a++) > > *c++ = 0; > > } > > > > but that simply because in our case it is not profitable. > > > > However, for the case: > > void matrix_add_const(unsigned int N, short *A, short val) { > > unsigned int i,j; > > for (i=0; i<N; i++) { > > for (j=0; j<N; j++) { > > A[i*N+j] += val; > > } > > } > > } > > > > GCC thinks for v850 and my port that the inner loop might be infinite. > > It looks like GCC is mangling the loop so much that the obviousness that the > inner loop is finite is lost. > > > > This however turns out to be very performance degrading. Using -fno-ivopts > makes generation of loops work again both in my port and v850. > > Is there a way to fine-tune ivopts besides trying to tune the costs or do > > you > reckon this is something iv-analysis should be smarter about? > > Well. We have > > Loop 2 is simple: > simple exit 5 -> 7 > infinite if: (expr_list:REG_DEP_TRUE (and:SI (reg:SI 76) > (const_int 1 [0x1])) > (nil)) > number of iterations: (lshiftrt:SI (plus:SI (minus:SI (reg:SI 68 [ D.1398 ]) > (reg:SI 64 [ ivtmp___6 ])) > (const_int -2 [0xfffffffffffffffe])) > (const_int 1 [0x1])) > upper bound: 2147483646 > realistic bound: -1 > Doloop: Possible infinite iteration case. > Doloop: The loop is not suitable. > > as we replaced the induction variable by a pointer induction with > step 2. So this might be a very common issue for RTL loop opts, > the upper bound of the IV is 2 * N in this case, so 2 * N & 1 > should be always false and thus "infinite" be optimized. > > (insn 34 33 36 3 (parallel [ > (set (reg:SI 76) > (plus:SI (reg/v:SI 71 [ N ]) > (reg/v:SI 71 [ N ]))) > (clobber (reg:CC 32 psw)) > ]) 21 {addsi3} > (expr_list:REG_UNUSED (reg:CC 32 psw) > (nil))) > > that doesn't look too difficult to do with the above definition. > nonzero_bits might be of use here, not sure (not my area of > expertise). >
I would like some comments on the following patch that seems to work but I think it could be generalized. The idea is for the specific infinite condition of type (and reg int), we can search for the definition of reg, check nonzero_bits and check that they don't match any of the bits in int. diff --git a/gcc/loop-iv.c b/gcc/loop-iv.c index 4c34007..215fd22 100644 --- a/gcc/loop-iv.c +++ b/gcc/loop-iv.c @@ -2064,6 +2064,50 @@ simplify_using_initial_values (struct loop *loop, enum rtx_code op, rtx *expr) e = single_pred_edge (e->src); } + /* For certain patterns we can do even better, like (and (reg) 1). */ + if (GET_CODE (*expr) == AND + && REG_P (XEXP (*expr, 0)) + && CONST_INT_P (XEXP (*expr, 1))) + { + rtx reg = XEXP (*expr, 0); + unsigned HOST_WIDE_INT mask = INTVAL (XEXP (*expr, 1)); + rtx insn_def = NULL_RTX; + basic_block bb = loop_preheader_edge (loop)->src; + + while (1) + { + rtx insn; + + if (bb == ENTRY_BLOCK_PTR) + break; + + FOR_BB_INSNS_REVERSE (bb, insn) + { + if (!INSN_P (insn)) + break; + if (df_reg_defined (insn, reg)) + { + insn_def = insn; + break; + } + } + + if (insn_def) + break; + bb = get_immediate_dominator (CDI_DOMINATORS, bb); + } + + if (insn_def) + { + rtx set = single_set (insn_def); + unsigned HOST_WIDE_INT nonzero = nonzero_bits (SET_SRC (set), GET_MODE (reg)); + + /* Assumption will never occur and can be optimized. */ + if ((mask & nonzero) == 0) + *expr = const0_rtx; + } + } + out: free_EXPR_LIST_list (&cond_list); if (!CONSTANT_P (*expr)) > Richard. > > > Paulo Matos > > > >> Richard. > >> > >> > The same situation occurs with a coremark function: > >> > void matrix_add_const(ee_u32 N, MATDAT *A, MATDAT val) { > >> > ee_u32 i,j; > >> > for (i=0; i<N; i++) { > >> > for (j=0; j<N; j++) { > >> > A[i*N+j] += val; > >> > } > >> > } > >> > } > >> > > >> > It also says the inner loop might be infinite but it can't N is given as > >> argument. j starts as 0, N is unsigned like N. This will loop N times. GCC > cannot > >> possibly assume array A will overwrite the value of N in the loop. This > >> seems > >> like something is wrong in alias analysis. > >> > > >> >> -- > >> >> PMatos