widening inhibits good vectorization

law at redhat dot com Mon, 16 Feb 2015 21:23:03 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65084


            Bug ID: 65084
           Summary: Lack of type narrowing/widening inhibits good
                    vectorization
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: law at redhat dot com

These are testcases extracted from 47477.  

short a[1024], b[1024];

void
foo (void)
{
  int i;
  for (i = 0; i < 1024; i++)
    {
      short c = (char) a[i] + 5;
      long long d = (long long) b[i] + 12;
      a[i] = c + d;
    }
}

Compiled with -O3 -mavx2 we ought to get something similar to:

short a[1024], b[1024];

void
foo (void)
{
  int i;
  for (i = 0; i < 1024; i++)
    {
      unsigned short c = ((short)(a[i] << 8) >> 8) + 5U;
      unsigned short d = b[i] + 12U;
      a[i] = c + d;
    }
}


though even in this case I still couldn't achieve the sign extension to be
actually performed as 16-bit left + right (signed) shift, while I guess that
would lead to even better code.
Or look at how we vectorize:
short a[1024], b[1024];

void
foo (void)
{
  int i;
  for (i = 0; i < 1024; i++)
    {
      unsigned char e = a[i];
      short c = e + 5;
      long long d = (long long) b[i] + 12;
      a[i] = c + d;
    }
}
(note, here forwprop pass already performs type promotion, instead of
converting a[i] to unsigned char and back to short, it computes a[i] & 255 in
short mode) and how we could instead with type demotions:
short a[1024], b[1024];

void
foo (void)
{
  int i;
  for (i = 0; i < 1024; i++)
    {
      unsigned short c = (a[i] & 0xff) + 5U;
      unsigned short d = b[i] + 12U;
      a[i] = c + d;
    }
}

These are all admittedly artificial testcases, but I've seen tons of loops
where multiple types were vectorized and I think in some portion of those loops
we could either use just a single type size, or at least decrease the number of
conversions and different type sizes in the vectorized loops.

[Bug tree-optimization/65084] New: Lack of type narrowing/widening inhibits good vectorization

Reply via email to