Re: The "right way" to handle alignment of pointer targets in the compiler?

Tim Prince Fri, 01 Jan 2010 18:52:20 -0800

Benjamin Redelings I wrote:

Hi,
I have been playing with the GCC vectorizer and examining assembly codethat is produced for dot products that are not for a fixed number ofelements. (This comes up surprisingly often in scientific codes.) Sofar, the generated code is not faster than non-vectorized code, and Ithink that it is because I can't find a way to tell the compiler thatthe target of a double* is 16-byte aligned.

 From Pr 27827 - http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827 :
"I just quickly glanced at the code, and I see that it never uses"movapd" from memory, which is a key to getting decent performance."

How many people would take advantage of special machinery for some oldCPU, if that's your goal?


simplifying your example to

double f3(const double* p_, const double* q_,int n)
{
      double sum = 0;
      for(int i=0; i<n;i++)
            sum += p_[i] * q_[i];

      return sum;
      }
g++ -S -O3 -march=pentium-m -ffast-math -mtune=barcelona -mfpmath=sse
(options chosen for my discontinued OS on discontinued CPU)
produces loop body

         .p2align 5,,24
 L4:
         movupd  (%ebx,%eax), %xmm0
         movupd  (%ecx,%eax), %xmm2
         incl    %edx
         addl    $16, %eax
         cmpl    %edx, %edi
         mulpd   %xmm2, %xmm0
         addpd   %xmm0, %xmm1
         ja      L4

On CPUs introduced in the last 2 years, movupd should be as fast asmovapd, and -mtune=barcelona should work well in general, not only inthis example.The bigger difference in performance, for longer loops, would come withfurther batching of sums, favoring loop lengths of multiples of 4 (or 8,with unrolling). That alignment already favors a fairly long loop.

As you're using C++, it seems you could have used inner_product() ratherthan writing out a function.

My Core I7 showed matrix multiply 25x25 times 25x100 producing 17Gflopswith gfortran in-line code. g++ produces about 80% of that.

Re: The "right way" to handle alignment of pointer targets in the compiler?

Reply via email to