Hello, On Fri, 22 Sep 2017, Bin.Cheng wrote:
> This is updated patch for loop interchange with review suggestions > resolved. Changes are: > 1) It does more light weight checks like rectangle loop nest check > earlier than before. > 2) It checks profitability of interchange before data dependence > computation. > 3) It calls find_data_references_in_loop only once for a loop nest now. > 4) Data dependence is open-computed so that we can skip instantly at > unknown dependence. > 5) It improves code generation in mapping induction variables for > loop nest, as well as > adding a simple dead code elimination pass. > 6) It changes magic constants into parameters. So I have a couple comments/questions. Something stylistic: > +class loop_cand > +{ > +public: > ... > + friend class tree_loop_interchange; > +private: Just make this all public (and hence a struct, not class). No need for friends in file local classes. > +single_use_in_loop (tree var, struct loop *loop) > ... > + FOR_EACH_IMM_USE_FAST (use_p, iterator, var) > + { > + stmt = USE_STMT (use_p); > ... > + basic_block bb = gimple_bb (stmt); > + gcc_assert (bb != NULL); This pattern reoccurs often in your patch: you check for a bb associated for a USE_STMT. Uses of SSA names always occur in basic blocks, no need for checking. Then, something about your handling of simple reductions: > +void > +loop_cand::classify_simple_reduction (reduction_p re) > +{ > ... > + /* Require memory references in producer and consumer are the same so > + that we can undo reduction during interchange. */ > + if (re->init_ref && !operand_equal_p (re->init_ref, re->fini_ref, 0)) > + return; Where is it checked that the undoing transformation is legal also from a data dep point of view? Think code like this: sum = X[i]; for (j ...) sum += X[j]; X[i] = sum; Moving the store into the inner loop isn't always correct and I don't seem to find where the above situation is rejected. Maybe I'm confused because I also don't see where you even can get into the above situation (though I do see testcases about this). The thing is, for an 2d loop nest to contain something like the above reduction it can't be perfect: for (j) { int sum = X[j]; // 1 for (i) sum += Y[j][i]; X[j] = sum; // 2 } But you do check for perfectness in proper_loop_form_for_interchange and prepare_perfect_loop_nest, so either you can't get into the situation or the checking can't be complete, or you define the above to be perfect nevertheless (probably because the load and store are in outer loop header/exit blocks?). The latter would mean that you accept also other code in header/footer of loops from a pure CFG perspective, so where is it checked that that other code (which aren't simple reductions) isn't harmful to the transformation? Then, the data dependence part of the new pass: > +bool > +tree_loop_interchange::valid_data_dependences (unsigned inner, unsigned > outer) > +{ > + struct data_dependence_relation *ddr; > + > + for (unsigned i = 0; ddrs.iterate (i, &ddr); ++i) > + { > + /* Skip no-dependence case. */ > + if (DDR_ARE_DEPENDENT (ddr) == chrec_known) > + continue; > + > + for (unsigned j = 0; j < DDR_NUM_DIR_VECTS (ddr); ++j) > + { > + lambda_vector dist_vect = DDR_DIST_VECT (ddr, j); > + unsigned level = dependence_level (dist_vect, loop_nest.length ()); > + > + /* If there is no carried dependence. */ > + if (level == 0) > + continue; > + > + level --; > + /* Skip case which has '>' as the leftmost direction. */ > + if (!lambda_vector_lexico_pos (dist_vect, level)) > + return false; Shouldn't happen as dist vectors are forced positive via DDR_REVERSED. > + /* If dependence is carried by outer loop of the two loops for > + interchange. */ > + if (level < outer) > + continue; > + > + lambda_vector dir_vect = DDR_DIR_VECT (ddr, j); > + /* If directions at both inner/outer levels are the same. */ > + if (dir_vect[inner] == dir_vect[outer]) > + continue; > + > + /* Be conservative, skip case if either direction at inner/outer > + levels is not '=' or '<'. */ > + if (dir_vect[inner] != dir_equal > + && dir_vect[inner] != dir_positive > + && dir_vect[inner] != dir_independent > + && dir_vect[inner] != dir_positive_or_equal) > + return false; > + > + if (dir_vect[outer] != dir_equal > + && dir_vect[outer] != dir_positive > + && dir_vect[outer] != dir_independent > + && dir_vect[outer] != dir_positive_or_equal) > + return false; Checking dir vectors doesn't make much sense in GCC: the elements are only ever set to dir_positive, dir_negative or dir_equal, exactly when distance is > 0, < 0 or == 0. So checking dist vector is enough. (though sameness of direction checks sameness of sign with zero). Incidentally: > +tree_loop_interchange::update_data_deps (unsigned inner, unsigned outer) > +{ > + struct data_dependence_relation *ddr; > + > + for (unsigned i = 0; ddrs.iterate (i, &ddr); ++i) > + { > + /* Skip no-dependence case. */ > + if (DDR_ARE_DEPENDENT (ddr) == chrec_known) > + continue; > + > + for (unsigned j = 0; j < DDR_NUM_DIR_VECTS (ddr); ++j) > + { > + lambda_vector dir_vect = DDR_DIR_VECT (ddr, j); > + std::swap (dir_vect[inner], dir_vect[outer]); > + } > + } > +} Here you swap only the direction but not the distance vector, which can't be right. I suggest only using (and updating) the distance vector. And then your usage and update of DR_ACCESS_FNs: there's quite some complexity connected with that and I'm not sure how worthwhile it is. You're basically using the ACCESS_FNs to determine profitability (and not for validity, and that's good). But e.g. for pointer based accesses like in fortran with explicit address arithmetic the relation between access-fn step and stride and actual access stride isn't that easy (e.g. in your should_interchange_loops function iloop_stride and oloop_stride will always be one for pointer based accesses). Conceptually what you should check is how the access address for each data ref revolves for each loop, so why not doing this explicitely? What I mean is: calculate a (complicated) chrec for the DR addresses for the whole nest at the beginning. It should be in the form like (assume "+" always): {{{init, s1}_l1, s2}_l2, s3}_l3 (i.e. all steps should be invariants/constants, and only one non-chrec init value). Addresses which aren't in this form you're already ignoring right now, so you could continue doing that. (Or better said, all non-constant steps you regard as being AVG_DIM_SIZE, which you still can continue doing). Now, with the above form you can form expressions for the difference between addresses per iteration for each loop (i.e. the address stride per loop); store these. Then, when interchanging loops you need to merely swap these expressions like you have to with the distance vector, instead of fiddling inside the DR_ACCESS_FNs themself. Much code would go away. Testcases: given that we had to remove our old separate interchange pass because it miscompiled stuff all over I'm missing some testcases where interchange should _not_ happen for validity reasons, like my above example with an reduction that can't be moved inside. Perhaps you can think of some more. I hope this is of some help to you :) Ciao, Michael.