On Fri, Jul 13, 2018 at 6:04 AM, Kelvin Nilsen <kdnil...@linux.ibm.com> wrote: > A somewhat old "issue report" pointed me to the code generated for a 4-fold > manually unrolled version of the following loop: > >> while (++len != len_limit) /* this is loop */ >> if (pb[len] != cur[len]) >> break; > > As unrolled, the loop appears as: > >> while (++len != len_limit) /* this is loop */ { >> if (pb[len] != cur[len]) >> break; >> if (++len == len_limit) /* unrolled 2nd iteration */ >> break; >> if (pb[len] != cur[len]) >> break; >> if (++len == len_limit) /* unrolled 3rd iteration */ >> break; >> if (pb[len] != cur[len]) >> break; >> if (++len == len_limit) /* unrolled 4th iteration */ >> break; >> if (pb[len] != cur[len]) >> break; >> } > > In examining the behavior of tree-ssa-loop-ivopts.c, I've discovered the only > induction variable candidates that are being considered are all forms of the > len variable. We are not considering any induction variables to represent > the address expressions &pb[len] and &cur[len]. > > I rewrote the source code for this loop to make the addressing expressions > more explicit, as in the following: > >> cur++; >> while (++pb != last_pb) /* this is loop */ { >> if (*pb != *cur) >> break; >> ++cur; >> if (++pb == last_pb) /* unrolled 2nd iteration */ >> break; >> if (*pb != *cur) >> break; >> ++cur; >> if (++pb == last_pb) /* unrolled 3rd iteration */ >> break; >> if (*pb != *cur) >> break; >> ++cur; >> if (++pb == last_pb) /* unrolled 4th iteration */ >> break; >> if (*pb != *cur) >> break; >> ++cur; >> } > > Now, gcc does a better job of identifying the "address expression induction > variables". This version of the loop runs about 10% faster than the original > on my target architecture. > > This would seem to be a textbook pattern for the induction variable analysis. > Does anyone have any thoughts on the best way to add these candidates to the > set of induction variables that are considered by tree-ssa-loop-ivopts.c? > > Thanks in advance for any suggestions. > Hi, Could you please file a bug with your original slow test code attached? I tried to construct meaningful test case from your code snippet but not successful. There is difference in generated assembly, but it's not that fundamental. So a bug with preprocessed test would be high appreciated. I think there are two potential issues in cost computation for such case: invariant expression and iv uses outside of loop handled as inside uses.
Thanks, bin