Hello,

On Fri, 22 Sep 2017, Bin.Cheng wrote:

> This is updated patch for loop interchange with review suggestions
> resolved.  Changes are:
>   1) It does more light weight checks like rectangle loop nest check
> earlier than before.
>   2) It checks profitability of interchange before data dependence 
> computation.
>   3) It calls find_data_references_in_loop only once for a loop nest now.
>   4) Data dependence is open-computed so that we can skip instantly at
> unknown dependence.
>   5) It improves code generation in mapping induction variables for
> loop nest, as well as
>      adding a simple dead code elimination pass.
>   6) It changes magic constants into parameters.

So I have a couple comments/questions.  Something stylistic:

> +class loop_cand
> +{
> +public:
> ...
> +  friend class tree_loop_interchange;
> +private:

Just make this all public (and hence a struct, not class).
No need for friends in file local classes.

> +single_use_in_loop (tree var, struct loop *loop)
> ...
> +  FOR_EACH_IMM_USE_FAST (use_p, iterator, var)
> +    {
> +      stmt = USE_STMT (use_p);
> ...
> +      basic_block bb = gimple_bb (stmt);
> +      gcc_assert (bb != NULL);

This pattern reoccurs often in your patch: you check for a bb associated 
for a USE_STMT.  Uses of SSA names always occur in basic blocks, no need 
for checking.

Then, something about your handling of simple reductions:

> +void
> +loop_cand::classify_simple_reduction (reduction_p re)
> +{
> ...
> +  /* Require memory references in producer and consumer are the same so
> +     that we can undo reduction during interchange.  */
> +  if (re->init_ref && !operand_equal_p (re->init_ref, re->fini_ref, 0))
> +    return;

Where is it checked that the undoing transformation is legal also
from a data dep point of view?  Think code like this:

   sum = X[i];
   for (j ...)
     sum += X[j];
   X[i] = sum;

Moving the store into the inner loop isn't always correct and I don't seem 
to find where the above situation is rejected.

Maybe I'm confused because I also don't see where you even can get into 
the above situation (though I do see testcases about this).  The thing is, 
for an 2d loop nest to contain something like the above reduction it can't 
be perfect:

   for (j) {
     int sum = X[j];      // 1
     for (i)
       sum += Y[j][i];
     X[j] = sum;          // 2
   }

But you do check for perfectness in proper_loop_form_for_interchange and 
prepare_perfect_loop_nest, so either you can't get into the situation or 
the checking can't be complete, or you define the above to be perfect 
nevertheless (probably because the load and store are in outer loop 
header/exit blocks?).  The latter would mean that you accept also other 
code in header/footer of loops from a pure CFG perspective, so where is it 
checked that that other code (which aren't simple reductions) isn't 
harmful to the transformation?

Then, the data dependence part of the new pass:

> +bool
> +tree_loop_interchange::valid_data_dependences (unsigned inner, unsigned 
> outer)
> +{
> +  struct data_dependence_relation *ddr;
> +
> +  for (unsigned i = 0; ddrs.iterate (i, &ddr); ++i)
> +    {
> +      /* Skip no-dependence case.  */
> +      if (DDR_ARE_DEPENDENT (ddr) == chrec_known)
> +     continue;
> +
> +      for (unsigned j = 0; j < DDR_NUM_DIR_VECTS (ddr); ++j)
> +     {
> +       lambda_vector dist_vect = DDR_DIST_VECT (ddr, j);
> +       unsigned level = dependence_level (dist_vect, loop_nest.length ());
> +
> +       /* If there is no carried dependence.  */
> +       if (level == 0)
> +         continue;
> +
> +       level --;
> +       /* Skip case which has '>' as the leftmost direction.  */
> +       if (!lambda_vector_lexico_pos (dist_vect, level))
> +         return false;

Shouldn't happen as dist vectors are forced positive via DDR_REVERSED.

> +       /* If dependence is carried by outer loop of the two loops for
> +          interchange.  */
> +       if (level < outer)
> +         continue;
> +
> +       lambda_vector dir_vect = DDR_DIR_VECT (ddr, j);
> +       /* If directions at both inner/outer levels are the same.  */
> +       if (dir_vect[inner] == dir_vect[outer])
> +         continue;
> +
> +       /* Be conservative, skip case if either direction at inner/outer
> +          levels is not '=' or '<'.  */
> +       if (dir_vect[inner] != dir_equal
> +           && dir_vect[inner] != dir_positive
> +           && dir_vect[inner] != dir_independent
> +           && dir_vect[inner] != dir_positive_or_equal)
> +         return false;
> +
> +       if (dir_vect[outer] != dir_equal
> +           && dir_vect[outer] != dir_positive
> +           && dir_vect[outer] != dir_independent
> +           && dir_vect[outer] != dir_positive_or_equal)
> +         return false;

Checking dir vectors doesn't make much sense in GCC: the elements are only
ever set to dir_positive, dir_negative or dir_equal, exactly when distance 
is
 > 0, < 0 or == 0.  So checking dist vector is enough. (though sameness of
direction checks sameness of sign with zero).  Incidentally:

> +tree_loop_interchange::update_data_deps (unsigned inner, unsigned outer)
> +{
> +  struct data_dependence_relation *ddr;
> +
> +  for (unsigned i = 0; ddrs.iterate (i, &ddr); ++i)
> +    {
> +      /* Skip no-dependence case.  */
> +      if (DDR_ARE_DEPENDENT (ddr) == chrec_known)
> +     continue;
> +
> +      for (unsigned j = 0; j < DDR_NUM_DIR_VECTS (ddr); ++j)
> +     {
> +       lambda_vector dir_vect = DDR_DIR_VECT (ddr, j);
> +       std::swap (dir_vect[inner], dir_vect[outer]);
> +     }
> +    }
> +}

Here you swap only the direction but not the distance vector, which can't 
be right.  I suggest only using (and updating) the distance vector.

And then your usage and update of DR_ACCESS_FNs: there's quite some 
complexity connected with that and I'm not sure how worthwhile it is.  
You're basically using the ACCESS_FNs to determine profitability (and not 
for validity, and that's good).  But e.g. for pointer based accesses like 
in fortran with explicit address arithmetic the relation between access-fn 
step and stride and actual access stride isn't that easy (e.g. in your 
should_interchange_loops function iloop_stride and oloop_stride will 
always be one for pointer based accesses).

Conceptually what you should check is how the access address for each data 
ref revolves for each loop, so why not doing this explicitely?  What I 
mean is: calculate a (complicated) chrec for the DR addresses for the 
whole nest at the beginning.  It should be in the form like (assume "+"
always):

  {{{init, s1}_l1, s2}_l2, s3}_l3

(i.e. all steps should be invariants/constants, and only one non-chrec 
init value).  Addresses which aren't in this form you're already ignoring 
right now, so you could continue doing that.  (Or better said, all 
non-constant steps you regard as being AVG_DIM_SIZE, which you still can 
continue doing).

Now, with the above form you can form expressions for the difference 
between addresses per iteration for each loop (i.e. the address stride per 
loop); store these.  Then, when interchanging loops you need to merely 
swap these expressions like you have to with the distance vector, instead 
of fiddling inside the DR_ACCESS_FNs themself.  Much code would go away.

Testcases: given that we had to remove our old separate interchange pass 
because it miscompiled stuff all over I'm missing some testcases where 
interchange should _not_ happen for validity reasons, like my above 
example with an reduction that can't be moved inside.  Perhaps you can 
think of some more.

I hope this is of some help to you :)


Ciao,
Michael.

Reply via email to