On Sun, Nov 29, 2009 at 7:12 PM, Toon Moene <t...@moene.org> wrote:
> Toon Moene wrote:
>
>> This is where IPA could help.  I created the following main program:
>>
>>      real a(10), b(10), c(10)
>>      a = 0.
>>      b = 1.
>>      print '(3(1x,z16))', loc(a), loc(b), loc(c)
>>      call sum(a, b, c, 10)
>>      print *, c(5)
>>      end
>
>> So the alignment of a, b and c is known and is correct for vectorization -
>> still the loop in the subroutine looks like this (objdump -S a.out):
>
> Inlining the "sum.f" subroutine by hand:
>
>      integer i
>      real a(10), b(10), c(10)
>      a = 0.
>      b = 1.
>      print '(3(1x,z16))', loc(a), loc(b), loc(c)
>      do i = 1, 10
>         c(i) = a(i) + b(i)
>      enddo
>      print *, c(5)
>      end
>
> *does* lead to better code:
>
>        movaps  1056(%rsp), %xmm0
>        movq    %rbp, %rdi
>        addps   1008(%rsp), %xmm0
>        movq    $.LC2, 488(%rsp)
>        movaps  %xmm0, 960(%rsp)
>        movl    $9, 496(%rsp)
>        movaps  1072(%rsp), %xmm0
>        movl    $128, 480(%rsp)
>        addps   1024(%rsp), %xmm0
>        movl    $6, 484(%rsp)
>        movaps  %xmm0, 976(%rsp)
>        movss   1088(%rsp), %xmm0
>        addss   1040(%rsp), %xmm0
>        movss   %xmm0, 992(%rsp)
>        movss   1092(%rsp), %xmm0
>        addss   1044(%rsp), %xmm0
>        movss   %xmm0, 996(%rsp)
>
> i.e., a completely unrolled and (SLP) vectorized code.
>
> So the potential is there - what we just need is an Alignment Propagation
> Pass (analogous to the Constant and the Range Propagation pass).

Such a thing already existed a few years ago (IIRC Haifa had something
that Dan picked up and passed on to me). But it never brought any
benefits. I don't have the pass anymore, but perhaps Dan still has a
copy of it somewhere.

Ciao!
Steven

Reply via email to