Hi, For very small loops (< 6 insns), it would be fine to unroll 4 times to run fast with less latency and better cache usage. Like below loops: while (i) a[--i] = NULL; while (p < e) *d++ = *p++;
With this patch enhances, we could see some performance improvement for some workloads(e.g. SPEC2017). Bootstrap and regtest pass on powerpc64le. Ok for trunk? BR, Jiufu Guo 2020-07-13 Jiufu Guo <guoji...@cn.ibm.com> * config/rs6000/rs6000.c (rs6000_loop_unroll_adjust): Refine hook. --- gcc/config/rs6000/rs6000.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 58f5d780603..06844fdba57 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -5135,16 +5135,15 @@ rs6000_destroy_cost_data (void *data) static unsigned rs6000_loop_unroll_adjust (unsigned nunroll, struct loop *loop) { - if (unroll_only_small_loops) + if (unroll_only_small_loops) { - /* TODO: This is hardcoded to 10 right now. It can be refined, for - example we may want to unroll very small loops more times (4 perhaps). - We also should use a PARAM for this. */ + /* TODO: Using hardcodes here, for tunable, PARAM(s) maybe refined. */ + if (loop->ninsns <= 6) + return MIN (4, nunroll); if (loop->ninsns <= 10) return MIN (2, nunroll); - else - return 0; + + return 0; } return nunroll; -- 2.25.1