Hello,
We are porting GCC 4.2.1 for our VLIW processor. To improve performance,
support of restrict keyword is imperative. From what I learn from GCC
documentation, "restrict" should be well supported since GCC3. Somehow,
I found it doesn't improve schedule even for simple example.


foo (int * restrict a, int * restrict b, int * restrict c) {
  unsigned i;

  for (i=0; i<256; i++){
    a[i] = b[i] + c[i];
  }
}

It is not only problem for our own porting. I also tried to compile for
ARM target. 
arm-elf-gcc vectorize.c -O3 -std=c99 -S -funroll-all-loops
-fdump-tree-all

It just generate sequences of load/load/store as the code's natural
order suggests. The scheduler never tries to move load beyond previous
store instruction in order to reduce cycle.

foo:
        @ args = 0, pretend = 0, frame = 0
        @ frame_needed = 0, uses_anonymous_args = 0
        ldr     ip, [r2, #0]
        stmfd   sp!, {r4, lr}
        mov     r4, r1
        ldr     r1, [r1, #0]
        mov     lr, r2
        add     r2, ip, r1
        str     r2, [r0, #0]
        mov     ip, #4
        ldr     r1, [ip, lr]
        ldr     r3, [ip, r4]
        add     r2, r1, r3
        str     r2, [ip, r0]
        add     r3, ip, #4
        ldr     r1, [r3, r4]
        ldr     r2, [r3, lr]
        add     r2, r2, r1
        str     r2, [r3, r0]
        add     r3, r3, #4
        ldr     r2, [r3, lr]
        ldr     r1, [r3, r4]
        add     r2, r2, r1
        str     r2, [r3, r0]
        add     ip, ip, #12
.L2:
        ldr     r1, [ip, lr]
        ldr     r3, [ip, r4]
        add     r2, r1, r3
        str     r2, [ip, r0]
        add     r3, ip, #4
        ldr     r1, [r3, r4]
        ldr     r2, [r3, lr]
        add     r2, r2, r1
        str     r2, [r3, r0]
        add     r3, r3, #4
        ldr     r1, [r3, r4]
        ldr     r2, [r3, lr]
        add     r2, r2, r1
        str     r2, [r3, r0]
      ...
        ldr     r3, [r1, lr]
        add     r3, r3, r2
        str     r3, [r1, r0]
        add     r2, ip, #32
        ldr     r3, [r2, lr]
        ldr     r1, [r2, r4]
        add     ip, ip, #36
        add     r3, r3, r1
        cmp     ip, #1024
        str     r3, [r2, r0]
        bne     .L2
        ldmfd   sp!, {r4, pc}
        .size   foo, .-foo
        .ident  "GCC: (GNU) 4.2.2"

I examine produced tree-SSA files:

In 004t.gimple, the restrict keyword is preserved
foo (a, b, c)
{
  unsigned int D.1352;
  int * D.1353;
  int * D.1354;
  int * D.1355;
  int D.1356;
  int * D.1357;
  int D.1358;
  int D.1359;
  unsigned int i;

  i = 0;
  goto <D1350>;
  <D1349>:;
  D.1352 = i * 4;
  D.1353 = (int * restrict) D.1352;
  D.1354 = D.1353 + a;
  D.1352 = i * 4;
  D.1353 = (int * restrict) D.1352;
  D.1355 = D.1353 + b;
  D.1356 = *D.1355;
  D.1352 = i * 4;
  D.1353 = (int * restrict) D.1352;
  D.1357 = D.1353 + c;
  D.1358 = *D.1357;
  D.1359 = D.1356 + D.1358;
  *D.1354 = D.1359;
  i = i + 1;
  <D1350>:;
  if (i <= 255)
    {
      goto <D1349>;
    }
  else
    {
      goto <D1351>;
    }
  <D1351>:;
}

But in .final_cleanup file,  the restrict key word just disppear.
foo (a, b, c)
{
  long unsigned int ivtmp.49;

<bb 2>:
  MEM[base: a] = MEM[base: c] + MEM[base: b];
  ivtmp.49 = 4;

<L0>:;
  MEM[base: a, index: ivtmp.49] = MEM[base: c, index: ivtmp.49] +
MEM[base: b, index: ivtmp.49];
  ivtmp.49 = ivtmp.49 + 4;
  if (ivtmp.49 != 1024) goto <L0>; else goto <L2>;

<L2>:;
  return;

}

Any hint to produce efficient code with "restrict" keyword?  Thank in
advance.

Cheers,
Bingfeng Mei 
Broadcom UK

Reply via email to