Re: On the x86_64, does one have to zero a vector register before filling it completely ?

Tim Prince Sat, 28 Nov 2009 08:32:09 -0800

Richard Guenther wrote:

On Sat, Nov 28, 2009 at 4:26 PM, Tim Prince <n...@aol.com> wrote:

Toon Moene wrote:

H.J. Lu wrote:

On Sat, Nov 28, 2009 at 3:21 AM, Toon Moene <t...@moene.org> wrote:

L.S.,


Due to the discussion on register allocation, I went back to a hobby of
mine: Studying the assembly output of the compiler.

For this Fortran subroutine (note: unless otherwise told to the Fortran
front end, reals are 32 bit floating point numbers):

    subroutine sum(a, b, c, n)
    integer i, n
    real a(n), b(n), c(n)
    do i = 1, n
       c(i) = a(i) + b(i)
    enddo
    end

with -O3 -S (GCC: (GNU) 4.5.0 20091123), I get this (vectorized) loop:

      xorps   %xmm2, %xmm2
      ....
.L6:
      movaps  %xmm2, %xmm0
      movaps  %xmm2, %xmm1
      movlps  (%r9,%rax), %xmm0
      movlps  (%r8,%rax), %xmm1
      movhps  8(%r9,%rax), %xmm0
      movhps  8(%r8,%rax), %xmm1
      incl    %ecx
      addps   %xmm1, %xmm0
      movaps  %xmm0, 0(%rbp,%rax)
      addq    $16, %rax
      cmpl    %ebx, %ecx
      jb      .L6

I'm not a master of x86_64 assembly, but this strongly looks like
%xmm{0,1}
have to be zero'd (%xmm2 is set to zero by xor'ing it with itself),
before
they are completely filled with the mov{l,h}ps instructions ?

I think it is used to avoid partial SSE register stall.

You mean there's no movaps (%r9,%rax), %xmm0 (and mutatis mutandis for
%xmm1) instruction (to copy 4*32 bits to the register) ?

If you want those, you must request them with -mtune=barcelona.


Which would then get you movups (%r9,%rax), %xmm0 (unaligned move).
generic tuning prefers the split moves, AMD Fam10 and above handle
unaligned moves just fine.

Correct, the movaps would have been used if alignment were recognized.
The newer CPUs achieve full performance with movups.
Do you consider Core i7/Nehalem as included in "AMD Fam10 and above?"

Re: On the x86_64, does one have to zero a vector register before filling it completely ?

Reply via email to