On Sun, Nov 29, 2009 at 7:12 PM, Toon Moene <t...@moene.org> wrote: > Toon Moene wrote: > >> This is where IPA could help. I created the following main program: >> >> real a(10), b(10), c(10) >> a = 0. >> b = 1. >> print '(3(1x,z16))', loc(a), loc(b), loc(c) >> call sum(a, b, c, 10) >> print *, c(5) >> end > >> So the alignment of a, b and c is known and is correct for vectorization - >> still the loop in the subroutine looks like this (objdump -S a.out): > > Inlining the "sum.f" subroutine by hand: > > integer i > real a(10), b(10), c(10) > a = 0. > b = 1. > print '(3(1x,z16))', loc(a), loc(b), loc(c) > do i = 1, 10 > c(i) = a(i) + b(i) > enddo > print *, c(5) > end > > *does* lead to better code: > > movaps 1056(%rsp), %xmm0 > movq %rbp, %rdi > addps 1008(%rsp), %xmm0 > movq $.LC2, 488(%rsp) > movaps %xmm0, 960(%rsp) > movl $9, 496(%rsp) > movaps 1072(%rsp), %xmm0 > movl $128, 480(%rsp) > addps 1024(%rsp), %xmm0 > movl $6, 484(%rsp) > movaps %xmm0, 976(%rsp) > movss 1088(%rsp), %xmm0 > addss 1040(%rsp), %xmm0 > movss %xmm0, 992(%rsp) > movss 1092(%rsp), %xmm0 > addss 1044(%rsp), %xmm0 > movss %xmm0, 996(%rsp) > > i.e., a completely unrolled and (SLP) vectorized code. > > So the potential is there - what we just need is an Alignment Propagation > Pass (analogous to the Constant and the Range Propagation pass).
Such a thing already existed a few years ago (IIRC Haifa had something that Dan picked up and passed on to me). But it never brought any benefits. I don't have the pass anymore, but perhaps Dan still has a copy of it somewhere. Ciao! Steven