Re: On the x86_64, does one have to zero a vector register before filling it completely ?

Tim Prince Sat, 28 Nov 2009 13:01:04 -0800

Toon Moene wrote:

Toon Moene wrote:

Tim Prince wrote:

 > If you want those, you must request them with -mtune=barcelona.

OK, so it is an alignment issue (with -mtune=barcelona):

.L6:
        movups  0(%rbp,%rax), %xmm0
        movups  (%rbx,%rax), %xmm1
        incl    %ecx
        addps   %xmm1, %xmm0
        movaps  %xmm0, (%r8,%rax)
        addq    $16, %rax
        cmpl    %r10d, %ecx
        jb      .L6

Once this problem is solved (well, determined how it could be solved),we go on to the next, the extraneous induction variable %ecx.


There are two ways to deal with it:

1. Eliminate it with respect to the other induction variable that
   counts in the same direction (upwards, with steps 16) and remember
   that induction variable's (%rax) limit.

or:

2. Count %ecx down from %r10d to zero (which eliminates %r10d as a loop
   carried register).

g77 avoided this by coding counted do loops with a separate loop countercounting down to zero - not so with gfortran (quoting):


/* Translate the simple DO construct.  This is where the loop variable
   has integer type and step +-1.  We can't use this in the general case
   because integer overflow and floating point errors could give
   incorrect results.
   We translate a do loop from:

   DO dovar = from, to, step
      body
   END DO

   to:

   [Evaluate loop bounds and step]
   dovar = from;
   if ((step > 0) ? (dovar <= to) : (dovar => to))
    {
      for (;;)
        {
          body;
   cycle_label:
          cond = (dovar == to);
          dovar += step;
          if (cond) goto end_label;
        }
      }
   end_label:

   This helps the optimizers by avoiding the extra induction variable
   used in the general case.  */

So either we teach the Fortran front end this trick, or we teach theloop optimization the trick of flipping the sense of a (n otherwiseunused) induction variable ....

This would have paid off more frequently in i386 mode, where there is apossibility of integer register pressure in loops small enough for suchan optimization to succeed.This seems to be among the types of optimizations envisioned forrun-time binary interpretation systems.

Re: On the x86_64, does one have to zero a vector register before filling it completely ?

Reply via email to