On Fri, 12 Oct 2018, Thomas Schwinge wrote:
Hmm, and without any OpenACC/OpenMP etc., actually the same problem is
also present when running the following code through the vectorizer:
for (int tmp = 0; tmp < N_J * N_I; ++tmp)
{
int j = tmp / N_I;
int i = tmp % N_I;
a[j][i] = 0;
}
... whereas the following variant (obviously) does vectorize:
int a[NJ * NI];
for (int tmp = 0; tmp < N_J * N_I; ++tmp)
a[tmp] = 0;
I had a quick look at the difference, and a[j][i] remains in this form
throughout optimization. If I write instead *((*(a+j))+i) = 0; I get
j_10 = tmp_17 / 1025;
i_11 = tmp_17 % 1025;
_1 = (long unsigned int) j_10;
_2 = _1 * 1025;
_3 = (sizetype) i_11;
_4 = _2 + _3;
or for a power of 2
j_10 = tmp_17 >> 10;
i_11 = tmp_17 & 1023;
_1 = (long unsigned int) j_10;
_2 = _1 * 1024;
_3 = (sizetype) i_11;
_4 = _2 + _3;
and in both cases we fail to notice that _4 = (sizetype) tmp_17; (at least
I think that's true).
So there are missing match.pd transformations in addition to whatever
scev/ivdep/other work is needed.
--
Marc Glisse