On Wed, Aug 4, 2021 at 4:36 AM Kewen.Lin <li...@linux.ibm.com> wrote: > > on 2021/8/3 下午8:08, Richard Biener wrote: > > On Fri, Jul 30, 2021 at 7:20 AM Kewen.Lin <li...@linux.ibm.com> wrote: > >> > >> on 2021/7/29 下午4:01, Richard Biener wrote: > >>> On Fri, Jul 23, 2021 at 10:41 AM Kewen.Lin <li...@linux.ibm.com> wrote: > >>>> > >>>> on 2021/7/22 下午8:56, Richard Biener wrote: > >>>>> On Tue, Jul 20, 2021 at 4:37 > >>>>> PM Kewen.Lin <li...@linux.ibm.com> wrote: > >>>>>> > >>>>>> Hi, > >>>>>> > >>>>>> This v2 has addressed some review comments/suggestions: > >>>>>> > >>>>>> - Use "!=" instead of "<" in function operator!= (const Iter &rhs) > >>>>>> - Add new CTOR loops_list (struct loops *loops, unsigned flags) > >>>>>> to support loop hierarchy tree rather than just a function, > >>>>>> and adjust to use loops* accordingly. > >>>>> > >>>>> I actually meant struct loop *, not struct loops * ;) At the point > >>>>> we pondered to make loop invariant motion work on single > >>>>> loop nests we gave up not only but also because it iterates > >>>>> over the loop nest but all the iterators only ever can process > >>>>> all loops, not say, all loops inside a specific 'loop' (and > >>>>> including that 'loop' if LI_INCLUDE_ROOT). So the > >>>>> CTOR would take the 'root' of the loop tree as argument. > >>>>> > >>>>> I see that doesn't trivially fit how loops_list works, at least > >>>>> not for LI_ONLY_INNERMOST. But I guess FROM_INNERMOST > >>>>> could be adjusted to do ONLY_INNERMOST as well? > >>>>> > >>>> > >>>> > >>>> Thanks for the clarification! I just realized that the previous > >>>> version with struct loops* is problematic, all traversal is > >>>> still bounded with outer_loop == NULL. I think what you expect > >>>> is to respect the given loop_p root boundary. Since we just > >>>> record the loops' nums, I think we still need the function* fn? > >>> > >>> Would it simplify things if we recorded the actual loop *? > >>> > >> > >> I'm afraid it's unsafe to record the loop*. I had the same > >> question why the loop iterator uses index rather than loop* when > >> I read this at the first time. I guess the design of processing > >> loops allows its user to update or even delete the folllowing > >> loops to be visited. For example, when the user does some tricks > >> on one loop, then it duplicates the loop and its children to > >> somewhere and then removes the loop and its children, when > >> iterating onto its children later, the "index" way will check its > >> validity by get_loop at that point, but the "loop *" way will > >> have some recorded pointers to become dangling, can't do the > >> validity check on itself, seems to need a side linear search to > >> ensure the validity. > >> > >>> There's still the to_visit reserve which needs a bound on > >>> the number of loops for efficiency reasons. > >>> > >> > >> Yes, I still keep the fn in the updated version. > >> > >>>> So I add one optional argument loop_p root and update the > >>>> visiting codes accordingly. Before this change, the previous > >>>> visiting uses the outer_loop == NULL as the termination condition, > >>>> it perfectly includes the root itself, but with this given root, > >>>> we have to use it as the termination condition to avoid to iterate > >>>> onto its possible existing next. > >>>> > >>>> For LI_ONLY_INNERMOST, I was thinking whether we can use the > >>>> code like: > >>>> > >>>> struct loops *fn_loops = loops_for_fn (fn)->larray; > >>>> for (i = 0; vec_safe_iterate (fn_loops, i, &aloop); i++) > >>>> if (aloop != NULL > >>>> && aloop->inner == NULL > >>>> && flow_loop_nested_p (tree_root, aloop)) > >>>> this->to_visit.quick_push (aloop->num); > >>>> > >>>> it has the stable bound, but if the given root only has several > >>>> child loops, it can be much worse if there are many loops in fn. > >>>> It seems impossible to predict the given root loop hierarchy size, > >>>> maybe we can still use the original linear searching for the case > >>>> loops_for_fn (fn) == root? But since this visiting seems not so > >>>> performance critical, I chose to share the code originally used > >>>> for FROM_INNERMOST, hope it can have better readability and > >>>> maintainability. > >>> > >>> I was indeed looking for something that has execution/storage > >>> bound on the subtree we're interested in. If we pull the CTOR > >>> out-of-line we can probably keep the linear search for > >>> LI_ONLY_INNERMOST when looking at the whole loop tree. > >>> > >> > >> OK, I've moved the suggested single loop tree walker out-of-line > >> to cfgloop.c, and brought the linear search back for > >> LI_ONLY_INNERMOST when looking at the whole loop tree. > >> > >>> It just seemed to me that we can eventually re-use a > >>> single loop tree walker for all orders, just adjusting the > >>> places we push. > >>> > >> > >> Wow, good point! Indeed, I have further unified all orders > >> handlings into a single function walk_loop_tree. > >> > >>>> > >>>> Bootstrapped and regtested on powerpc64le-linux-gnu P9, > >>>> x86_64-redhat-linux and aarch64-linux-gnu, also > >>>> bootstrapped on ppc64le P9 with bootstrap-O3 config. > >>>> > >>>> Does the attached patch meet what you expect? > >>> > >>> So yeah, it's probably close to what is sensible. Not sure > >>> whether optimizing the loops for the !only_push_innermost_p > >>> case is important - if we manage to produce a single > >>> walker with conditionals based on 'flags' then IPA-CP should > >>> produce optimal clones as well I guess. > >>> > >> > >> Thanks for the comments, the updated v2 is attached. > >> Comparing with v1, it does: > >> > >> - Unify one single loop tree walker for all orders. > >> - Move walk_loop_tree out-of-line to cfgloop.c. > >> - Keep the linear search for LI_ONLY_INNERMOST with > >> tree_root of fn loops. > >> - Use class loop * instead of loop_p. > >> > >> Bootstrapped & regtested on powerpc64le-linux-gnu Power9 > >> (with/without the hunk for LI_ONLY_INNERMOST linear search, > >> it can have the coverage to exercise LI_ONLY_INNERMOST > >> in walk_loop_tree when "without"). > >> > >> Is it ok for trunk? > > > > Looks good to me. I think that the 'mn' was an optimization > > for the linear walk and it's cheaper to pointer test against > > the actual 'root' loop (no need to dereference). Thus > > > > + if (flags & LI_ONLY_INNERMOST && tree_root == loops->tree_root) > > { > > - for (i = 0; vec_safe_iterate (loops_for_fn (fn)->larray, i, &aloop); > > i++) > > + class loop *aloop; > > + unsigned int i; > > + for (i = 0; vec_safe_iterate (loops->larray, i, &aloop); i++) > > if (aloop != NULL > > && aloop->inner == NULL > > - && aloop->num >= mn) > > + && aloop->num != mn) > > this->to_visit.quick_push (aloop->num); > > > > could elide the aloop->num != mn check and start iterating from 1, > > since loops->tree_root->num == 0 > > > > and the walk_loop_tree could simply do > > > > class loop *exclude = flags & LI_INCLUDE_ROOT ? NULL : root; > > > > and pointer test aloop against exclude. That avoids the idea that > > 'mn' is a vehicle to exclude one random loop from the iteration. > > > > Good idea! Thanks for the comments! The attached v3 has addressed > the review comments on "mn". > > Bootstrapped & regtested again on powerpc64le-linux-gnu Power9 > (with/without the hunk for LI_ONLY_INNERMOST linear search). > > Is it ok for trunk?
+ /* Early handle root without any inner loops, make later + processing simpler, that is all loops processed in the + following while loop are impossible to be root. */ + if (!root->inner) + { + if (root != exclude) + this->to_visit.quick_push (root->num); + return; + } could be if (!root->inner) { if (flags & LI_INCLUDE_ROOT) this->to_visit.quick_push (root->num); } + class loop *aloop; + for (aloop = root; + aloop->inner != NULL; + aloop = aloop->inner) + { + if (preorder_p && aloop != exclude) + this->to_visit.quick_push (aloop->num); + continue; + } could be + class loop *aloop; + for (aloop = root->inner; + aloop->inner != NULL; + aloop = aloop->inner) + { + if (preorder_p) + this->to_visit.quick_push (aloop->num); + continue; + } + /* When visiting from innermost, we need to consider root here + since the previous while loop doesn't handle it. */ + if (from_innermost_p && root != exclude) + this->to_visit.quick_push (root->num); could be like the first. I think that's more clear even. Sorry for finding a better solution again. OK with that change Richard. > BR, > Kewen > ----- > gcc/ChangeLog: > > * cfgloop.h (loops_list::loops_list): Add one optional argument root > and adjust accordingly, update loop tree walking and factor out > to ... > * cfgloop.c (loops_list::walk_loop_tree): ...this. New function.