http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50789

             Bug #: 50789
           Summary: Gather vectorization
    Classification: Unclassified
           Product: gcc
           Version: 4.7.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassig...@gcc.gnu.org
        ReportedBy: ja...@gcc.gnu.org
                CC: hjl.to...@gmail.com, i...@gcc.gnu.org,
                    kirill.yuk...@intel.com


This is to track progress on vectorization using AVX2 v*gather* instructions.

The instructions allow plain unconditional gather, e.g.:
#define N 1024
float f[N];
int k[N];
float *l[N];
int **m[N];

float
f1 (void)
{
  int i;
  float g = 0.0;
  for (i = 0; i < N; i++)
    g += f[k[i]];
  return g;
}

float
f2 (float *p)
{
  int i;
  float g = 0.0;
  for (i = 0; i < N; i++)
    g += p[k[i]];
  return g;
}

float
f3 (void)
{
  int i;
  float g = 0.0;
  for (i = 0; i < N; i++)
    g += *l[i];
  return g;
}

int
f4 (void)
{
  int i;
  int g = 0;
  for (i = 0; i < N; i++)
    g += **m[i];
  return g;
}

should be able to vectorize all 4 loops.  In f1/f2 it would use non-zero base
(the vector would contain just indexes into some array, which vgather sign
extends and adds to base), in f3/f4 it would use zero base - the vectors would
be vectors of pointers (resp. uintptr_t).

To vectorize the above I'm afraid we'd need to modify tree-data-ref.c as well
as tree-vect-data-ref.c, because the memory accesses aren't affine and already
dr_analyze_innermost gives up on those, doesn't fill in any of the DR_* stuff.
Perhaps with some flag and when the base resp. offset has vdef in the same loop
we could mark it somehow and at least fill in the other fields.  It would
probably make alias decisions (in tree-vect-data-ref.c?) harder.  Any ideas?

What is additionally possible is to conditionalize loads, either affine or not.
So something like:
for (i = 0; i < N; i++)
  {
    c = 6;
    if (a[i] > 24)
      c = b[i];
    d[i] = c + e[i];
  }
for the affine conditional accesses where the vector could be just { 0, 1, 2,
3, ... } but the mask from the comparison.

Reply via email to