> -----Original Message-----
> From: Richard Biener [mailto:richard.guent...@gmail.com]
> Sent: 08 January 2014 14:42
> To: Paulo Matos
> Cc: Andrew Haley; gcc@gcc.gnu.org; Jan Hubicka
> Subject: Re: Infinite number of iterations in loop [v850, mep]
> 
> On Wed, Jan 8, 2014 at 3:09 PM, Paulo Matos <pma...@broadcom.com> wrote:
> >> -----Original Message-----
> >> From: Richard Biener [mailto:richard.guent...@gmail.com]
> >> Sent: 08 January 2014 11:03
> >> To: Paulo Matos
> >> Cc: Andrew Haley; gcc@gcc.gnu.org
> >> Subject: Re: Infinite number of iterations in loop [v850, mep]
> >>
> >> That was refering to the case with extern b.  For the above case the
> >> issue must be sth else.  Trying a cross to v850-elf to see if it
> >> reproduces for me (if 'b' is a stack or argument slot then we might
> >> bogously think that *c++ = 0 may clobber it, otherwise RTL
> >> number of iteration analysis might just be confused).
> >>
> >> So for example (should be arch independent)
> >>
> >> struct X { int i; int j; int k; int l[24]; };
> >>
> >> int foo (struct X x, int *p)
> >> {
> >>   int z = x.j;
> >>   *p = 1;
> >>   return z;
> >> }
> >>
> >> see if there is a anti-dependence between x.j and *p on the RTL side
> >> (at least the code dispatching to the tree oracle using the MEM_EXPRs
> >> should save you from that).
> >>
> >> So - v850 at least doesn't pass b in memory and the doloop recognition
> >> works for me (on trunk).
> >>
> >
> > You are right, everything is fine with the above example regarding the anti-
> dependence and with the loop as well. I got confused with mine not generating 
> a
> loop for
> > void fn1 (unsigned int b)
> > {
> >   unsigned int a;
> >   for (a = 0; a < b; a++)
> >     *c++ = 0;
> > }
> >
> > but that simply because in our case it is not profitable.
> >
> > However, for the case:
> > void matrix_add_const(unsigned int N, short *A, short val) {
> >  unsigned int i,j;
> >  for (i=0; i<N; i++) {
> >   for (j=0; j<N; j++) {
> >    A[i*N+j] += val;
> >   }
> >  }
> > }
> >
> > GCC thinks for v850 and my port that the inner loop might be infinite.
> > It looks like GCC is mangling the loop so much that the obviousness that the
> inner loop is finite is lost.
> >
> > This however turns out to be very performance degrading. Using -fno-ivopts
> makes generation of loops work again both in my port and v850.
> > Is there a way to fine-tune ivopts besides trying to tune the costs or do 
> > you
> reckon this is something iv-analysis should be smarter about?
> 
> Well.  We have
> 
> Loop 2 is simple:
>   simple exit 5 -> 7
>   infinite if: (expr_list:REG_DEP_TRUE (and:SI (reg:SI 76)
>         (const_int 1 [0x1]))
>     (nil))
>   number of iterations: (lshiftrt:SI (plus:SI (minus:SI (reg:SI 68 [ D.1398 ])
>             (reg:SI 64 [ ivtmp___6 ]))
>         (const_int -2 [0xfffffffffffffffe]))
>     (const_int 1 [0x1]))
>   upper bound: 2147483646
>   realistic bound: -1
> Doloop: Possible infinite iteration case.
> Doloop: The loop is not suitable.
> 
> as we replaced the induction variable by a pointer induction with
> step 2.  So this might be a very common issue for RTL loop opts,
> the upper bound of the IV is 2 * N in this case, so 2 * N & 1
> should be always false and thus "infinite" be optimized.
> 
> (insn 34 33 36 3 (parallel [
>             (set (reg:SI 76)
>                 (plus:SI (reg/v:SI 71 [ N ])
>                     (reg/v:SI 71 [ N ])))
>             (clobber (reg:CC 32 psw))
>         ]) 21 {addsi3}
>      (expr_list:REG_UNUSED (reg:CC 32 psw)
>         (nil)))
> 
> that doesn't look too difficult to do with the above definition.
> nonzero_bits might be of use here, not sure (not my area of
> expertise).
> 

I would like some comments on the following patch that seems to work but I 
think it could be generalized.
The idea is for the specific infinite condition of type (and reg int), we can 
search for the definition of reg,
check nonzero_bits and check that they don't match any of the bits in int.

diff --git a/gcc/loop-iv.c b/gcc/loop-iv.c
index 4c34007..215fd22 100644
--- a/gcc/loop-iv.c
+++ b/gcc/loop-iv.c
@@ -2064,6 +2064,50 @@ simplify_using_initial_values (struct loop *loop, enum 
rtx_code op, rtx *expr)
       e = single_pred_edge (e->src);
     }
 
+  /* For certain patterns we can do even better, like (and (reg) 1).  */
+  if (GET_CODE (*expr) == AND
+      && REG_P (XEXP (*expr, 0))
+      && CONST_INT_P (XEXP (*expr, 1)))
+    {
+      rtx reg = XEXP (*expr, 0);
+      unsigned HOST_WIDE_INT mask = INTVAL (XEXP (*expr, 1));
+      rtx insn_def = NULL_RTX;
+      basic_block bb = loop_preheader_edge (loop)->src;
+
+      while (1)
+       {
+         rtx insn;
+
+         if (bb == ENTRY_BLOCK_PTR)
+           break;
+
+         FOR_BB_INSNS_REVERSE (bb, insn)
+           {
+             if (!INSN_P (insn))
+               break;
+             if (df_reg_defined (insn, reg))
+               {
+                 insn_def = insn;
+                 break;
+               }
+           }
+
+         if (insn_def)
+           break;
+         bb = get_immediate_dominator (CDI_DOMINATORS, bb);
+       }
+
+      if (insn_def)
+       {
+         rtx set = single_set (insn_def);
+         unsigned HOST_WIDE_INT nonzero = nonzero_bits (SET_SRC (set), 
GET_MODE (reg));
+
+         /* Assumption will never occur and can be optimized.  */
+         if ((mask & nonzero) == 0)
+           *expr = const0_rtx;
+       }
+    }
+
  out:
   free_EXPR_LIST_list (&cond_list);
   if (!CONSTANT_P (*expr))


> Richard.
> 
> > Paulo Matos
> >
> >> Richard.
> >>
> >> > The same situation occurs with a coremark function:
> >> > void matrix_add_const(ee_u32 N, MATDAT *A, MATDAT val) {
> >> >  ee_u32 i,j;
> >> >  for (i=0; i<N; i++) {
> >> >   for (j=0; j<N; j++) {
> >> >    A[i*N+j] += val;
> >> >   }
> >> >  }
> >> > }
> >> >
> >> > It also says the inner loop might be infinite but it can't N is given as
> >> argument. j starts as 0, N is unsigned like N. This will loop N times. GCC
> cannot
> >> possibly assume array A will overwrite the value of N in the loop. This 
> >> seems
> >> like something is wrong in alias analysis.
> >> >
> >> >> --
> >> >> PMatos

Reply via email to