Thanks. I was looking at bfin. MT's implementation looks similar but
simpler.  

> -----Original Message-----
> From: Ramana Radhakrishnan [mailto:[EMAIL PROTECTED] 
> Sent: 16 July 2008 19:17
> To: Bingfeng Mei
> Cc: gcc@gcc.gnu.org
> Subject: Re: Question about doloop_end pattern
> 
> Hi Bingfeng,
> 
> > Hello,
> > I tried to use doloop_end pattern to reduce loop overhead 
> for our target
> > processor, which features a dedicated loop instruction.  
> Somehow even a
> > simple loop just cannot pass the test of doloop_condition_get, which
> > requires following canonical pattern.
> 
> 
> I checked this on our private port of GCC .  This is based off 4.3
> branch which is off what we are working on right now .  We do use the
> doloop pattern to generate out these cases in our port and I can
> confirm that for our case we generate the following bit of code. Our
> tree does have a few other tweaks that we maintain that we'd like to
> contribute once the copyright assignments are in place.
> 
> Unroll:
>        c2c     $c5,$c2
>        i2cs    $c4,63
> .L2:
>        ldw     $c2,($c5)+=1
>        add     $c2,$c1,$c2
>        stw     ($c3)+=1,$c2
>        brinzdec        $c4,.L2
>        brz     $zero,$link
> 
> You probably want to see the mt backend for some example as to how to
> do it . It looks similar to how we do it in ours.
> 
> 
> cheers
> Ramana
> 
> ----
> Ramana Radhakrishnan
> Icera Semiconductor
> 
> On Wed, Jul 16, 2008 at 12:05 PM, Bingfeng Mei 
> <[EMAIL PROTECTED]> wrote:
> > Hello,
> > I tried to use doloop_end pattern to reduce loop overhead 
> for our target
> > processor, which features a dedicated loop instruction.  
> Somehow even a
> > simple loop just cannot pass the test of doloop_condition_get, which
> > requires following canonical pattern.
> >
> >  /* The canonical doloop pattern we expect has one of the following
> >     forms:
> >
> >     1)  (parallel [(set (pc) (if_then_else (condition)
> >                                            (label_ref (label))
> >                                            (pc)))
> >                     (set (reg) (plus (reg) (const_int -1)))
> >                     (additional clobbers and uses)])
> >
> >     The branch must be the first entry of the parallel 
> (also required
> >     by jump.c), and the second entry of the parallel must 
> be a set of
> >     the loop counter register.  Some targets (IA-64) wrap the set of
> >     the loop counter in an if_then_else too.
> >
> >     2)  (set (reg) (plus (reg) (const_int -1))
> >         (set (pc) (if_then_else (reg != 0)
> >                                 (label_ref (label))
> >                                 (pc))).  */
> >
> >
> > Here is a simple function I used, it should meet all doloop 
> optimization
> > requirements.
> > void Unroll( short s, int * restrict b_inout, int *restrict 
> out, int N)
> > {
> >        int i;
> >        for (i=0; i<64; i++)
> >        {
> >                out[i] = b_inout[i] +  s;
> >        }
> > }
> >
> >
> > In tree ivcanon pass, it is converted to
> > ;; Function Unroll (Unroll)
> >
> > Unroll (short int s, int * restrict b_inout, int * restrict 
> out, int N)
> > {
> >  unsigned int ivtmp.14;
> >  int pretmp.9;
> >  long unsigned int pretmp.8;
> >  int storetmp.6;
> >  int i;
> >  int D.1459;
> >  int D.1458;
> >  int D.1457;
> >  int * D.1456;
> >  int * D.1455;
> >  long unsigned int D.1454;
> >  long unsigned int D.1453;
> >
> > <bb 2>:
> >  pretmp.9_8 = (int) s_12(D);
> >
> > <bb 3>:
> >  # ivtmp.14_13 = PHI <ivtmp.14_21(4), 64(2)>
> >  # i_19 = PHI <i_15(4), 0(2)>
> >  D.1453_3 = (long unsigned int) i_19;
> >  D.1454_4 = D.1453_3 * 4;
> >  D.1455_6 = out_5(D) + D.1454_4;
> >  D.1456_10 = b_inout_9(D) + D.1454_4;
> >  D.1457_11 = *D.1456_10;
> >  D.1459_14 = pretmp.9_8 + D.1457_11;
> >  *D.1455_6 = D.1459_14;
> >  i_15 = i_19 + 1;
> >  ivtmp.14_21 = ivtmp.14_13 - 1;
> >  if (ivtmp.14_21 != 0)
> >    goto <bb 4>;
> >  else
> >    goto <bb 5>;
> >
> > <bb 4>:
> >  goto <bb 3>;
> >
> > <bb 5>:
> >  return;
> >
> > }
> >
> >
> > This should match requirements of doloop_condition_get.  But after
> > ivopts pass, the code is transformed to:
> >
> > ;; Function Unroll (Unroll)
> >
> > Unroll (short int s, int * restrict b_inout, int * restrict 
> out, int N)
> > {
> >  long unsigned int ivtmp.21;
> >  unsigned int ivtmp.14;
> >  int pretmp.9;
> >  long unsigned int pretmp.8;
> >  int storetmp.6;
> >  int i;
> >  int D.1459;
> >  int D.1458;
> >  int D.1457;
> >  int * D.1456;
> >  int * D.1455;
> >  long unsigned int D.1454;
> >  long unsigned int D.1453;
> >
> > <bb 2>:
> >  pretmp.9_8 = (int) s_12(D);
> >
> > <bb 3>:
> >  # ivtmp.21_7 = PHI <ivtmp.21_16(4), 0(2)>
> >  D.1457_11 = MEM[base: b_inout_9(D), index: ivtmp.21_7];
> >  D.1459_14 = pretmp.9_8 + D.1457_11;
> >  MEM[base: out_5(D), index: ivtmp.21_7] = D.1459_14;
> >  ivtmp.21_16 = ivtmp.21_7 + 4;
> >  if (ivtmp.21_16 != 256)
> >    goto <bb 4>;
> >  else
> >    goto <bb 5>;
> >
> > <bb 4>:
> >  goto <bb 3>;
> >
> > <bb 5>:
> >  return;
> >
> > }
> >
> >
> > It is not required canonical form anymore. And later RTL level
> > optimizations cannot convert it back. Since it doesn't pass the
> > doloop_condition_get test, modulo scheduling pass doesn't 
> work too.  Do
> > I miss something here?  Any hint is greatly appreciated.
> >
> > Cheers,
> > Bingfeng Mei
> >
> >
> >
> 
> 
> 
> -- 
> Ramana Radhakrishnan
> 
> 

Reply via email to