On Wed, Jan 8, 2014 at 3:09 PM, Paulo Matos <pma...@broadcom.com> wrote:
>> -----Original Message-----
>> From: Richard Biener [mailto:richard.guent...@gmail.com]
>> Sent: 08 January 2014 11:03
>> To: Paulo Matos
>> Cc: Andrew Haley; gcc@gcc.gnu.org
>> Subject: Re: Infinite number of iterations in loop [v850, mep]
>>
>> That was refering to the case with extern b.  For the above case the
>> issue must be sth else.  Trying a cross to v850-elf to see if it
>> reproduces for me (if 'b' is a stack or argument slot then we might
>> bogously think that *c++ = 0 may clobber it, otherwise RTL
>> number of iteration analysis might just be confused).
>>
>> So for example (should be arch independent)
>>
>> struct X { int i; int j; int k; int l[24]; };
>>
>> int foo (struct X x, int *p)
>> {
>>   int z = x.j;
>>   *p = 1;
>>   return z;
>> }
>>
>> see if there is a anti-dependence between x.j and *p on the RTL side
>> (at least the code dispatching to the tree oracle using the MEM_EXPRs
>> should save you from that).
>>
>> So - v850 at least doesn't pass b in memory and the doloop recognition
>> works for me (on trunk).
>>
>
> You are right, everything is fine with the above example regarding the 
> anti-dependence and with the loop as well. I got confused with mine not 
> generating a loop for
> void fn1 (unsigned int b)
> {
>   unsigned int a;
>   for (a = 0; a < b; a++)
>     *c++ = 0;
> }
>
> but that simply because in our case it is not profitable.
>
> However, for the case:
> void matrix_add_const(unsigned int N, short *A, short val) {
>  unsigned int i,j;
>  for (i=0; i<N; i++) {
>   for (j=0; j<N; j++) {
>    A[i*N+j] += val;
>   }
>  }
> }
>
> GCC thinks for v850 and my port that the inner loop might be infinite.
> It looks like GCC is mangling the loop so much that the obviousness that the 
> inner loop is finite is lost.
>
> This however turns out to be very performance degrading. Using -fno-ivopts 
> makes generation of loops work again both in my port and v850.
> Is there a way to fine-tune ivopts besides trying to tune the costs or do you 
> reckon this is something iv-analysis should be smarter about?

Well.  We have

Loop 2 is simple:
  simple exit 5 -> 7
  infinite if: (expr_list:REG_DEP_TRUE (and:SI (reg:SI 76)
        (const_int 1 [0x1]))
    (nil))
  number of iterations: (lshiftrt:SI (plus:SI (minus:SI (reg:SI 68 [ D.1398 ])
            (reg:SI 64 [ ivtmp___6 ]))
        (const_int -2 [0xfffffffffffffffe]))
    (const_int 1 [0x1]))
  upper bound: 2147483646
  realistic bound: -1
Doloop: Possible infinite iteration case.
Doloop: The loop is not suitable.

as we replaced the induction variable by a pointer induction with
step 2.  So this might be a very common issue for RTL loop opts,
the upper bound of the IV is 2 * N in this case, so 2 * N & 1
should be always false and thus "infinite" be optimized.

(insn 34 33 36 3 (parallel [
            (set (reg:SI 76)
                (plus:SI (reg/v:SI 71 [ N ])
                    (reg/v:SI 71 [ N ])))
            (clobber (reg:CC 32 psw))
        ]) 21 {addsi3}
     (expr_list:REG_UNUSED (reg:CC 32 psw)
        (nil)))

that doesn't look too difficult to do with the above definition.
nonzero_bits might be of use here, not sure (not my area of
expertise).

Richard.

> Paulo Matos
>
>> Richard.
>>
>> > The same situation occurs with a coremark function:
>> > void matrix_add_const(ee_u32 N, MATDAT *A, MATDAT val) {
>> >  ee_u32 i,j;
>> >  for (i=0; i<N; i++) {
>> >   for (j=0; j<N; j++) {
>> >    A[i*N+j] += val;
>> >   }
>> >  }
>> > }
>> >
>> > It also says the inner loop might be infinite but it can't N is given as
>> argument. j starts as 0, N is unsigned like N. This will loop N times. GCC 
>> cannot
>> possibly assume array A will overwrite the value of N in the loop. This seems
>> like something is wrong in alias analysis.
>> >
>> >> --
>> >> PMatos

Reply via email to