Hi!

Thanks for all your comments already!  I continued looked into this for a
bit (but then got interrupted by a higher-priority task).  Regarding this
one specifically:

On Fri, 12 Oct 2018 21:14:11 +0200, Marc Glisse <marc.gli...@inria.fr> wrote:
> On Fri, 12 Oct 2018, Thomas Schwinge wrote:
> 
> > Hmm, and without any OpenACC/OpenMP etc., actually the same problem is
> > also present when running the following code through the vectorizer:
> >
> >    for (int tmp = 0; tmp < N_J * N_I; ++tmp)
> >      {
> >        int j = tmp / N_I;
> >        int i = tmp % N_I;
> >        a[j][i] = 0;
> >      }
> >
> > ... whereas the following variant (obviously) does vectorize:
> >
> >    int a[NJ * NI];
> >
> >    for (int tmp = 0; tmp < N_J * N_I; ++tmp)
> >      a[tmp] = 0;
> 
> I had a quick look at the difference, and a[j][i] remains in this form 
> throughout optimization. If I write instead *((*(a+j))+i) = 0; I get
> 
>    j_10 = tmp_17 / 1025;
>    i_11 = tmp_17 % 1025;
>    _1 = (long unsigned int) j_10;
>    _2 = _1 * 1025;
>    _3 = (sizetype) i_11;
>    _4 = _2 + _3;
> 
> or for a power of 2
> 
>    j_10 = tmp_17 >> 10;
>    i_11 = tmp_17 & 1023;
>    _1 = (long unsigned int) j_10;
>    _2 = _1 * 1024;
>    _3 = (sizetype) i_11;
>    _4 = _2 + _3;
> 
> and in both cases we fail to notice that _4 = (sizetype) tmp_17; (at least 
> I think that's true).
> 
> So there are missing match.pd transformations in addition to whatever 
> scev/ivdep/other work is needed.

With a very simplistic "match.pd" rule (not yet any special cases
checking etc.):

diff --git gcc/match.pd gcc/match.pd
index b36d7ccb5dc3..4c23116308da 100644
--- gcc/match.pd
+++ gcc/match.pd
@@ -5126,3 +5126,28 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
                            { wide_int_to_tree (sizetype, off); })
              { swap_p ? @0 : @2; }))
            { rhs_tree; })))))))))
+
+/* Given:
+
+       j = in / N_I
+       i = in % N_I
+
+   ..., fold:
+
+       out = j * N_I + i
+
+   ..., into:
+
+       out = in
+*/
+
+/* As long as only considering N_I being INTEGER_CST (which are always second
+   argument?), probably don't need ":c" variants?  */
+
+(simplify
+ (plus:c
+  (mult:c
+   (trunc_div @0 INTEGER_CST@1)
+   INTEGER_CST@1)
+  (trunc_mod @0 INTEGER_CST@1))
+ (convert @0))

..., the original code:

    int f1(int in)
    {
      int j = in / N_I;
      int i = in % N_I;
    
      int out = j * N_I + i;
    
      return out;
    }

... gets simplified from ("div-mod-0.c.027t.objsz1"):

    f1 (int in)
    {
      int out;
      int i;
      int j;
      int _1;
      int _6;
    
      <bb 2> :
      gimple_assign <trunc_div_expr, j_3, in_2(D), 100, NULL>
      gimple_assign <trunc_mod_expr, i_4, in_2(D), 100, NULL>
      gimple_assign <mult_expr, _1, j_3, 100, NULL>
      gimple_assign <plus_expr, out_5, i_4, _1, NULL>
      gimple_assign <ssa_name, _6, out_5, NULL, NULL>
      gimple_return <_6>
    
    }

... to ("div-mod-0.c.028t.ccp1"):

    f1 (int in)
    {
      int out;
      int i;
      int j;
      int _1;
    
      <bb 2> :
      gimple_assign <trunc_div_expr, j_3, in_2(D), 100, NULL>
      gimple_assign <trunc_mod_expr, i_4, in_2(D), 100, NULL>
      gimple_assign <mult_expr, _1, j_3, 100, NULL>
      gimple_return <in_2(D)>
    
    }

(The three dead "gimple_assign"s get eliminated later on.)

So, that works.

However, it doesn't work yet for the original construct that I'd ran
into, which looks like this:

    [...]
    int i;
    int j;
    [...]
    signed int .offset.5_2;
    [...]
    unsigned int .offset.7_23;
    unsigned int .iter.0_24;
    unsigned int _25;
    unsigned int _26;
    [...]
    unsigned int .iter.0_32;
    [...]
    
    <bb 9> :
    # gimple_phi <.offset.5_2, .offset.5_21(8), .offset.5_30(9)>
    gimple_assign <nop_expr, .offset.7_23, .offset.5_2, NULL, NULL>
    gimple_assign <plus_expr, .iter.0_24, 0, .offset.7_23, NULL>
    gimple_assign <trunc_div_expr, _25, .iter.0_24, 100, NULL>
    gimple_assign <trunc_mod_expr, _26, .iter.0_24, 100, NULL>
    gimple_assign <nop_expr, i_27, _26, NULL, NULL>
    gimple_assign <nop_expr, j_28, _25, NULL, NULL>
    gimple_assign <integer_cst, a[j_28][i_27], 123, NULL, NULL>
    [...]

Resolving the "a[j][i] = 123" we'll need to look into later.

As Marc noted above, with that changed into "*(*(a + j) + i) = 123", we
get:

    [...]
    int i;
    int j;
    long unsigned int _1;
    long unsigned int _2;
    sizetype _3;
    sizetype _4;
    sizetype _5;
    int * _6;
    [...]
    signed int .offset.5_8;
    [...]
    unsigned int .offset.7_29;
    unsigned int .iter.0_30;
    unsigned int _31;
    unsigned int _32;
    [...]
    
    <bb 9> :
    # gimple_phi <.offset.5_8, .offset.5_27(8), .offset.5_36(9)>
    gimple_assign <nop_expr, .offset.7_29, .offset.5_8, NULL, NULL>
    gimple_assign <plus_expr, .iter.0_30, 0, .offset.7_29, NULL>
    gimple_assign <trunc_div_expr, _31, .iter.0_30, 100, NULL>
    gimple_assign <trunc_mod_expr, _32, .iter.0_30, 100, NULL>
    gimple_assign <nop_expr, i_33, _32, NULL, NULL>
    gimple_assign <nop_expr, j_34, _31, NULL, NULL>
    gimple_assign <nop_expr, _1, j_34, NULL, NULL>
    gimple_assign <mult_expr, _2, _1, 100, NULL>
    gimple_assign <nop_expr, _3, i_33, NULL, NULL>
    gimple_assign <plus_expr, _4, _2, _3, NULL>
    gimple_assign <mult_expr, _5, _4, 4, NULL>
    gimple_assign <pointer_plus_expr, _6, &a, _5, NULL>
    gimple_assign <integer_cst, *_6, 123, NULL, NULL>
    [...]

Here, unless I'm confused, "_4" is supposed to be equal to ".iter.0_30",
but "match.pd" doesn't agree yet.  Note the many "nop_expr"s here, which
I have not yet figured out how to handle, I suppose?  I tried some things
but couldn't get it to work.  Apparently the existing instances of
"(match (nop_convert @0)" and "Basic strip-useless-type-conversions /
strip_nops" rule also don't handle these; should they?  Or, are in fact
here the types mixed up too much?

I hope to get some time again soon to continue looking into this, but if
anybody got any ideas, I'm all ears.


Grüße
 Thomas

Reply via email to