Hello, We are porting GCC 4.2.1 for our VLIW processor. To improve performance, support of restrict keyword is imperative. From what I learn from GCC documentation, "restrict" should be well supported since GCC3. Somehow, I found it doesn't improve schedule even for simple example.
foo (int * restrict a, int * restrict b, int * restrict c) { unsigned i; for (i=0; i<256; i++){ a[i] = b[i] + c[i]; } } It is not only problem for our own porting. I also tried to compile for ARM target. arm-elf-gcc vectorize.c -O3 -std=c99 -S -funroll-all-loops -fdump-tree-all It just generate sequences of load/load/store as the code's natural order suggests. The scheduler never tries to move load beyond previous store instruction in order to reduce cycle. foo: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 ldr ip, [r2, #0] stmfd sp!, {r4, lr} mov r4, r1 ldr r1, [r1, #0] mov lr, r2 add r2, ip, r1 str r2, [r0, #0] mov ip, #4 ldr r1, [ip, lr] ldr r3, [ip, r4] add r2, r1, r3 str r2, [ip, r0] add r3, ip, #4 ldr r1, [r3, r4] ldr r2, [r3, lr] add r2, r2, r1 str r2, [r3, r0] add r3, r3, #4 ldr r2, [r3, lr] ldr r1, [r3, r4] add r2, r2, r1 str r2, [r3, r0] add ip, ip, #12 .L2: ldr r1, [ip, lr] ldr r3, [ip, r4] add r2, r1, r3 str r2, [ip, r0] add r3, ip, #4 ldr r1, [r3, r4] ldr r2, [r3, lr] add r2, r2, r1 str r2, [r3, r0] add r3, r3, #4 ldr r1, [r3, r4] ldr r2, [r3, lr] add r2, r2, r1 str r2, [r3, r0] ... ldr r3, [r1, lr] add r3, r3, r2 str r3, [r1, r0] add r2, ip, #32 ldr r3, [r2, lr] ldr r1, [r2, r4] add ip, ip, #36 add r3, r3, r1 cmp ip, #1024 str r3, [r2, r0] bne .L2 ldmfd sp!, {r4, pc} .size foo, .-foo .ident "GCC: (GNU) 4.2.2" I examine produced tree-SSA files: In 004t.gimple, the restrict keyword is preserved foo (a, b, c) { unsigned int D.1352; int * D.1353; int * D.1354; int * D.1355; int D.1356; int * D.1357; int D.1358; int D.1359; unsigned int i; i = 0; goto <D1350>; <D1349>:; D.1352 = i * 4; D.1353 = (int * restrict) D.1352; D.1354 = D.1353 + a; D.1352 = i * 4; D.1353 = (int * restrict) D.1352; D.1355 = D.1353 + b; D.1356 = *D.1355; D.1352 = i * 4; D.1353 = (int * restrict) D.1352; D.1357 = D.1353 + c; D.1358 = *D.1357; D.1359 = D.1356 + D.1358; *D.1354 = D.1359; i = i + 1; <D1350>:; if (i <= 255) { goto <D1349>; } else { goto <D1351>; } <D1351>:; } But in .final_cleanup file, the restrict key word just disppear. foo (a, b, c) { long unsigned int ivtmp.49; <bb 2>: MEM[base: a] = MEM[base: c] + MEM[base: b]; ivtmp.49 = 4; <L0>:; MEM[base: a, index: ivtmp.49] = MEM[base: c, index: ivtmp.49] + MEM[base: b, index: ivtmp.49]; ivtmp.49 = ivtmp.49 + 4; if (ivtmp.49 != 1024) goto <L0>; else goto <L2>; <L2>:; return; } Any hint to produce efficient code with "restrict" keyword? Thank in advance. Cheers, Bingfeng Mei Broadcom UK