------- Comment #7 from rguenth at gcc dot gnu dot org 2009-09-30 08:59 ------- Hm, on x86_64 with -O3 -funroll-loops I see
f: .LFB0: .cfi_startproc xorl %eax, %eax .p2align 4,,10 .p2align 3 .L2: movdqu (%rsi,%rax), %xmm7 movdqu %xmm7, (%rdi,%rax) movdqu 16(%rsi,%rax), %xmm6 movdqu %xmm6, 16(%rdi,%rax) movdqu 32(%rsi,%rax), %xmm5 movdqu %xmm5, 32(%rdi,%rax) movdqu 48(%rsi,%rax), %xmm4 movdqu %xmm4, 48(%rdi,%rax) movdqu 64(%rsi,%rax), %xmm3 movdqu %xmm3, 64(%rdi,%rax) movdqu 80(%rsi,%rax), %xmm2 movdqu %xmm2, 80(%rdi,%rax) movdqu 96(%rsi,%rax), %xmm1 movdqu %xmm1, 96(%rdi,%rax) movdqu 112(%rsi,%rax), %xmm0 movdqu %xmm0, 112(%rdi,%rax) subq $-128, %rax cmpq $4096, %rax jne .L2 rep ret which looks pretty optimal to me ... If you disable vectorization then restrict doesn't make a difference anymore because of TARGET_MEM_REFs: <bb 3>: # i_15 = PHI <i_10(3), 0(2)> D.2707_9 = MEM[base: s_7(D), index: i_15, step: 4]; MEM[base: t_4(D), index: i_15, step: 4] = D.2707_9; i_10 = i_15 + 1; if (i_10 != 1024) goto <bb 3>; The MEM_EXPRs we get from expansion do not have points-to information anymore. It's possible to fix that, though it might be not without losses elsewhere (TMRs suck). Well, I'll try to have a look here during stage3. -- rguenth at gcc dot gnu dot org changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|unassigned at gcc dot gnu |rguenth at gcc dot gnu dot |dot org |org Status|NEW |ASSIGNED Last reconfirmed|2009-09-29 21:18:52 |2009-09-30 08:59:39 date| | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22031