On Mon, 2020-07-06 at 15:13 +0800, guojiufu via Gcc-patches wrote: Hi,
Assorted comments below. thanks :-) > For very small loops (< 6 insns), it would be fine to unroll 4 > times to use cache line better. Like below loops: > `while (i) a[--i] = NULL; while (p < e) *d++ = *p++;` > > And for very complex loops which may cause negative impacts: > branch-miss or cache-miss. Like below loop: there are calls, > early exits and branches in loop. > ``` > for (int i = 0; i < n; i++) { > int e = a[I]; > .... > if (function_call(e)) break; > .... > } > ``` > > This patch enhances RTL unroll for small loops and prevent to > unroll complex loops. ok. > > gcc/ChangeLog > 2020-07-03 Jiufu Guo <guoji...@linux.ibm.com> > > * config/rs6000/rs6000.c (rs6000_loop_unroll_adjust): Refine hook. > (rs6000_complex_loop_p): New function. > (num_loop_calls): New function. Tabs versus spaces. (num_loop_calls): New function. > --- > gcc/config/rs6000/rs6000.c | 46 +++++++++++++++++++++++++++++++++--- > -- > 1 file changed, 40 insertions(+), 6 deletions(-) > > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c > index 58f5d780603..a4874fa0efc 100644 > --- a/gcc/config/rs6000/rs6000.c > +++ b/gcc/config/rs6000/rs6000.c > @@ -5130,22 +5130,56 @@ rs6000_destroy_cost_data (void *data) > free (data); > } > > +/* Count the number of call insns in LOOP. */ > +static unsigned int > +num_loop_calls (struct loop *loop) > +{ > + basic_block *bbs; > + rtx_insn *insn; > + unsigned int i; > + unsigned int call_ins_num = 0; > + > + bbs = get_loop_body (loop); > + for (i = 0; i < loop->num_nodes; i++) > + FOR_BB_INSNS (bbs[i], insn) > + if (CALL_P (insn)) > + call_ins_num++; > + > + free (bbs); > + > + return call_ins_num; > +} ok. > + > +/* Return true if LOOP is too complex to be unrolled. */ > +static bool > +rs6000_complex_loop_p (struct loop *loop) > +{ > + unsigned call_num; > + > + return loop->ninsns > 10 > + && (call_num = num_loop_calls (loop)) > 0 > + && (call_num + num_loop_branches (loop)) * 5 > loop->ninsns > + && !single_exit (loop); > +} > + The assignment to call_num within the logic there concerns me. I'd break that out. The 5 value is not explicitly mentioned elsewhere. Contextually this appears to be evaluating the ratio of branches versus instructions within the loop. Could use some clarity. > /* Implement targetm.loop_unroll_adjust. */ > > static unsigned > rs6000_loop_unroll_adjust (unsigned nunroll, struct loop *loop) > { > - if (unroll_only_small_loops) > + if (unroll_only_small_loops) indentation fix looks ok. > { > - /* TODO: This is hardcoded to 10 right now. It can be refined, for > - example we may want to unroll very small loops more times (4 perhaps). > - We also should use a PARAM for this. */ Still hardcoded values, and may still wish to eventually have this as a tunable param. Probably OK to drop the 2nd sentence, but first and last sentences should probably stay. > + if (loop->ninsns <= 6) > + return MIN (4, nunroll); > if (loop->ninsns <= 10) > return MIN (2, nunroll); > - else > - return 0; > + > + return 0; > } ok > > + if (rs6000_complex_loop_p (loop)) > + return 0; > + > return nunroll; > } > ok