On Fri, 12 Oct 2018, Thomas Schwinge wrote:

Hmm, and without any OpenACC/OpenMP etc., actually the same problem is
also present when running the following code through the vectorizer:

   for (int tmp = 0; tmp < N_J * N_I; ++tmp)
     {
       int j = tmp / N_I;
       int i = tmp % N_I;
       a[j][i] = 0;
     }

... whereas the following variant (obviously) does vectorize:

   int a[NJ * NI];

   for (int tmp = 0; tmp < N_J * N_I; ++tmp)
     a[tmp] = 0;

I had a quick look at the difference, and a[j][i] remains in this form throughout optimization. If I write instead *((*(a+j))+i) = 0; I get

  j_10 = tmp_17 / 1025;
  i_11 = tmp_17 % 1025;
  _1 = (long unsigned int) j_10;
  _2 = _1 * 1025;
  _3 = (sizetype) i_11;
  _4 = _2 + _3;

or for a power of 2

  j_10 = tmp_17 >> 10;
  i_11 = tmp_17 & 1023;
  _1 = (long unsigned int) j_10;
  _2 = _1 * 1024;
  _3 = (sizetype) i_11;
  _4 = _2 + _3;

and in both cases we fail to notice that _4 = (sizetype) tmp_17; (at least I think that's true).

So there are missing match.pd transformations in addition to whatever scev/ivdep/other work is needed.

--
Marc Glisse

Reply via email to