On Fri, Oct 12, 2018 at 2:14 PM Marc Glisse <marc.gli...@inria.fr> wrote:
> On Fri, 12 Oct 2018, Thomas Schwinge wrote: > > > Hmm, and without any OpenACC/OpenMP etc., actually the same problem is > > also present when running the following code through the vectorizer: > > > > for (int tmp = 0; tmp < N_J * N_I; ++tmp) > > { > > int j = tmp / N_I; > > int i = tmp % N_I; > > a[j][i] = 0; > > } > > > > ... whereas the following variant (obviously) does vectorize: > > > > int a[NJ * NI]; > > > > for (int tmp = 0; tmp < N_J * N_I; ++tmp) > > a[tmp] = 0; > > I had a quick look at the difference, and a[j][i] remains in this form > throughout optimization. If I write instead *((*(a+j))+i) = 0; I get > > j_10 = tmp_17 / 1025; > i_11 = tmp_17 % 1025; > _1 = (long unsigned int) j_10; > _2 = _1 * 1025; > _3 = (sizetype) i_11; > _4 = _2 + _3; > > or for a power of 2 > > j_10 = tmp_17 >> 10; > i_11 = tmp_17 & 1023; > _1 = (long unsigned int) j_10; > _2 = _1 * 1024; > _3 = (sizetype) i_11; > _4 = _2 + _3; > > and in both cases we fail to notice that _4 = (sizetype) tmp_17; (at least > I think that's true). > > If this folding is correct, the dependence analysis would not have to handle array accesses with div and mod, and it would be able to classify the loop as parallel which will enable vectorization. > So there are missing match.pd transformations in addition to whatever > scev/ivdep/other work is needed. > > -- > Marc Glisse >