Hello,
I tried to use doloop_end pattern to reduce loop overhead for our target
processor, which features a dedicated loop instruction.  Somehow even a
simple loop just cannot pass the test of doloop_condition_get, which
requires following canonical pattern.

  /* The canonical doloop pattern we expect has one of the following
     forms:

     1)  (parallel [(set (pc) (if_then_else (condition)
                                            (label_ref (label))
                                            (pc)))
                     (set (reg) (plus (reg) (const_int -1)))
                     (additional clobbers and uses)])

     The branch must be the first entry of the parallel (also required
     by jump.c), and the second entry of the parallel must be a set of
     the loop counter register.  Some targets (IA-64) wrap the set of
     the loop counter in an if_then_else too.

     2)  (set (reg) (plus (reg) (const_int -1))
         (set (pc) (if_then_else (reg != 0)
                                 (label_ref (label))
                                 (pc))).  */


Here is a simple function I used, it should meet all doloop optimization
requirements.
void Unroll( short s, int * restrict b_inout, int *restrict out, int N)
{
        int i;
        for (i=0; i<64; i++)
        {
                out[i] = b_inout[i] +  s;
        }
}


In tree ivcanon pass, it is converted to 
;; Function Unroll (Unroll)

Unroll (short int s, int * restrict b_inout, int * restrict out, int N)
{
  unsigned int ivtmp.14;
  int pretmp.9;
  long unsigned int pretmp.8;
  int storetmp.6;
  int i;
  int D.1459;
  int D.1458;
  int D.1457;
  int * D.1456;
  int * D.1455;
  long unsigned int D.1454;
  long unsigned int D.1453;

<bb 2>:
  pretmp.9_8 = (int) s_12(D);

<bb 3>:
  # ivtmp.14_13 = PHI <ivtmp.14_21(4), 64(2)>
  # i_19 = PHI <i_15(4), 0(2)>
  D.1453_3 = (long unsigned int) i_19;
  D.1454_4 = D.1453_3 * 4;
  D.1455_6 = out_5(D) + D.1454_4;
  D.1456_10 = b_inout_9(D) + D.1454_4;
  D.1457_11 = *D.1456_10;
  D.1459_14 = pretmp.9_8 + D.1457_11;
  *D.1455_6 = D.1459_14;
  i_15 = i_19 + 1;
  ivtmp.14_21 = ivtmp.14_13 - 1;
  if (ivtmp.14_21 != 0)
    goto <bb 4>;
  else
    goto <bb 5>;

<bb 4>:
  goto <bb 3>;

<bb 5>:
  return;

}


This should match requirements of doloop_condition_get.  But after
ivopts pass, the code is transformed to: 

;; Function Unroll (Unroll)

Unroll (short int s, int * restrict b_inout, int * restrict out, int N)
{
  long unsigned int ivtmp.21;
  unsigned int ivtmp.14;
  int pretmp.9;
  long unsigned int pretmp.8;
  int storetmp.6;
  int i;
  int D.1459;
  int D.1458;
  int D.1457;
  int * D.1456;
  int * D.1455;
  long unsigned int D.1454;
  long unsigned int D.1453;

<bb 2>:
  pretmp.9_8 = (int) s_12(D);

<bb 3>:
  # ivtmp.21_7 = PHI <ivtmp.21_16(4), 0(2)>
  D.1457_11 = MEM[base: b_inout_9(D), index: ivtmp.21_7];
  D.1459_14 = pretmp.9_8 + D.1457_11;
  MEM[base: out_5(D), index: ivtmp.21_7] = D.1459_14;
  ivtmp.21_16 = ivtmp.21_7 + 4;
  if (ivtmp.21_16 != 256)
    goto <bb 4>;
  else
    goto <bb 5>;

<bb 4>:
  goto <bb 3>;

<bb 5>:
  return;

}


It is not required canonical form anymore. And later RTL level
optimizations cannot convert it back. Since it doesn't pass the
doloop_condition_get test, modulo scheduling pass doesn't work too.  Do
I miss something here?  Any hint is greatly appreciated.

Cheers,
Bingfeng Mei


Reply via email to