On Mon, Oct 22, 2018 at 6:35 PM Thomas Schwinge <tho...@codesourcery.com> wrote: > > Hi! > > Thanks for all your comments already! I continued looked into this for a > bit (but then got interrupted by a higher-priority task). Regarding this > one specifically: > > On Fri, 12 Oct 2018 21:14:11 +0200, Marc Glisse <marc.gli...@inria.fr> wrote: > > On Fri, 12 Oct 2018, Thomas Schwinge wrote: > > > > > Hmm, and without any OpenACC/OpenMP etc., actually the same problem is > > > also present when running the following code through the vectorizer: > > > > > > for (int tmp = 0; tmp < N_J * N_I; ++tmp) > > > { > > > int j = tmp / N_I; > > > int i = tmp % N_I; > > > a[j][i] = 0; > > > } > > > > > > ... whereas the following variant (obviously) does vectorize: > > > > > > int a[NJ * NI]; > > > > > > for (int tmp = 0; tmp < N_J * N_I; ++tmp) > > > a[tmp] = 0; > > > > I had a quick look at the difference, and a[j][i] remains in this form > > throughout optimization. If I write instead *((*(a+j))+i) = 0; I get > > > > j_10 = tmp_17 / 1025; > > i_11 = tmp_17 % 1025; > > _1 = (long unsigned int) j_10; > > _2 = _1 * 1025; > > _3 = (sizetype) i_11; > > _4 = _2 + _3; > > > > or for a power of 2 > > > > j_10 = tmp_17 >> 10; > > i_11 = tmp_17 & 1023; > > _1 = (long unsigned int) j_10; > > _2 = _1 * 1024; > > _3 = (sizetype) i_11; > > _4 = _2 + _3; > > > > and in both cases we fail to notice that _4 = (sizetype) tmp_17; (at least > > I think that's true). > > > > So there are missing match.pd transformations in addition to whatever > > scev/ivdep/other work is needed. > > With a very simplistic "match.pd" rule (not yet any special cases > checking etc.): > > diff --git gcc/match.pd gcc/match.pd > index b36d7ccb5dc3..4c23116308da 100644 > --- gcc/match.pd > +++ gcc/match.pd > @@ -5126,3 +5126,28 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > { wide_int_to_tree (sizetype, off); }) > { swap_p ? @0 : @2; })) > { rhs_tree; }))))))))) > + > +/* Given: > + > + j = in / N_I > + i = in % N_I > + > + ..., fold: > + > + out = j * N_I + i > + > + ..., into: > + > + out = in > +*/ > + > +/* As long as only considering N_I being INTEGER_CST (which are always second > + argument?), probably don't need ":c" variants? */ > + > +(simplify > + (plus:c > + (mult:c > + (trunc_div @0 INTEGER_CST@1) > + INTEGER_CST@1) > + (trunc_mod @0 INTEGER_CST@1)) > + (convert @0)) > > ..., the original code: > > int f1(int in) > { > int j = in / N_I; > int i = in % N_I; > > int out = j * N_I + i; > > return out; > } > > ... gets simplified from ("div-mod-0.c.027t.objsz1"): > > f1 (int in) > { > int out; > int i; > int j; > int _1; > int _6; > > <bb 2> : > gimple_assign <trunc_div_expr, j_3, in_2(D), 100, NULL> > gimple_assign <trunc_mod_expr, i_4, in_2(D), 100, NULL> > gimple_assign <mult_expr, _1, j_3, 100, NULL> > gimple_assign <plus_expr, out_5, i_4, _1, NULL> > gimple_assign <ssa_name, _6, out_5, NULL, NULL> > gimple_return <_6> > > } > > ... to ("div-mod-0.c.028t.ccp1"): > > f1 (int in) > { > int out; > int i; > int j; > int _1; > > <bb 2> : > gimple_assign <trunc_div_expr, j_3, in_2(D), 100, NULL> > gimple_assign <trunc_mod_expr, i_4, in_2(D), 100, NULL> > gimple_assign <mult_expr, _1, j_3, 100, NULL> > gimple_return <in_2(D)> > > } > > (The three dead "gimple_assign"s get eliminated later on.) > > So, that works. > > However, it doesn't work yet for the original construct that I'd ran > into, which looks like this: > > [...] > int i; > int j; > [...] > signed int .offset.5_2; > [...] > unsigned int .offset.7_23; > unsigned int .iter.0_24; > unsigned int _25; > unsigned int _26; > [...] > unsigned int .iter.0_32; > [...] > > <bb 9> : > # gimple_phi <.offset.5_2, .offset.5_21(8), .offset.5_30(9)> > gimple_assign <nop_expr, .offset.7_23, .offset.5_2, NULL, NULL> > gimple_assign <plus_expr, .iter.0_24, 0, .offset.7_23, NULL> > gimple_assign <trunc_div_expr, _25, .iter.0_24, 100, NULL> > gimple_assign <trunc_mod_expr, _26, .iter.0_24, 100, NULL> > gimple_assign <nop_expr, i_27, _26, NULL, NULL> > gimple_assign <nop_expr, j_28, _25, NULL, NULL> > gimple_assign <integer_cst, a[j_28][i_27], 123, NULL, NULL> > [...] > > Resolving the "a[j][i] = 123" we'll need to look into later. > > As Marc noted above, with that changed into "*(*(a + j) + i) = 123", we > get: > > [...] > int i; > int j; > long unsigned int _1; > long unsigned int _2; > sizetype _3; > sizetype _4; > sizetype _5; > int * _6; > [...] > signed int .offset.5_8; > [...] > unsigned int .offset.7_29; > unsigned int .iter.0_30; > unsigned int _31; > unsigned int _32; > [...] > > <bb 9> : > # gimple_phi <.offset.5_8, .offset.5_27(8), .offset.5_36(9)> > gimple_assign <nop_expr, .offset.7_29, .offset.5_8, NULL, NULL> > gimple_assign <plus_expr, .iter.0_30, 0, .offset.7_29, NULL> > gimple_assign <trunc_div_expr, _31, .iter.0_30, 100, NULL> > gimple_assign <trunc_mod_expr, _32, .iter.0_30, 100, NULL> > gimple_assign <nop_expr, i_33, _32, NULL, NULL> > gimple_assign <nop_expr, j_34, _31, NULL, NULL> > gimple_assign <nop_expr, _1, j_34, NULL, NULL> > gimple_assign <mult_expr, _2, _1, 100, NULL> > gimple_assign <nop_expr, _3, i_33, NULL, NULL> > gimple_assign <plus_expr, _4, _2, _3, NULL> > gimple_assign <mult_expr, _5, _4, 4, NULL> > gimple_assign <pointer_plus_expr, _6, &a, _5, NULL> > gimple_assign <integer_cst, *_6, 123, NULL, NULL> > [...] > > Here, unless I'm confused, "_4" is supposed to be equal to ".iter.0_30", > but "match.pd" doesn't agree yet. Note the many "nop_expr"s here, which > I have not yet figured out how to handle, I suppose? I tried some things > but couldn't get it to work. Apparently the existing instances of > "(match (nop_convert @0)" and "Basic strip-useless-type-conversions / > strip_nops" rule also don't handle these; should they? Or, are in fact > here the types mixed up too much?
There are sign-extensions to larger types involved which are not no-op ones. So it's a little bit more complicated than the case you are catching with the pattern. Richard. > I hope to get some time again soon to continue looking into this, but if > anybody got any ideas, I'm all ears. > > > Grüße > Thomas