This patch attempts to fix a libgcc codegen regression introduced in gcc-10, as -ftree-loop-distribute-patterns was enabled at -O2.
The ldist pass turns even very short loops into memset calls. E.g., the TFmode emulation calls end with a loop of up to 3 iterations, to zero out trailing words, and the loop distribution pass turns them into calls of the memset builtin. Though short constant-length memsets are usually dealt with efficiently, for non-constant-length ones, the options are setmemM, or a function calls. RISC-V doesn't have any setmemM pattern, so the loops above end up "optimized" into memset calls, incurring not only the overhead of an explicit call, but also discarding the information the compiler has about the alignment of the destination, and that the length is a multiple of the word alignment. This patch adds to the loop distribution pass some cost analysis based on preexisting *_RATIO macros, so that we won't transform loops with trip counts as low as the ratios we'd rather expand inline. This patch is not finished; it needs adjustments to the testsuite, to make up for the behavior changes it brings about. Specifically, on a x86_64-linux-gnu regstrap, it regresses: > FAIL: gcc.dg/pr53265.c (test for warnings, line 40) > FAIL: gcc.dg/pr53265.c (test for warnings, line 42) > FAIL: gcc.dg/tree-ssa/ldist-38.c scan-tree-dump ldist "split to 0 loops and 1 > library cal> FAIL: g++.dg/tree-ssa/pr78847.C -std=gnu++14 scan-tree-dump > ldist "split to 0 loops and 1 library calls" > FAIL: g++.dg/tree-ssa/pr78847.C -std=gnu++17 scan-tree-dump ldist "split to > 0 loops and 1 library calls" > FAIL: g++.dg/tree-ssa/pr78847.C -std=gnu++2a scan-tree-dump ldist "split to > 0 loops and 1 library calls" I suppose just lengthening the loops will take care of ldist-38 and pr78847, but the loss of the warnings in pr53265 is more concerning, and will require investigation. Nevertheless, I seek feedback on whether this is an acceptable approach, or whether we should use alternate tuning parameters for ldist, or something entirely different. Thanks in advance, for gcc/ChangeLog * tree-loop-distribution.c (maybe_normalize_partition): New. (loop_distribution::distribute_loop): Call it. [requires testsuite adjustments and investigation of a warning regression] --- gcc/tree-loop-distribution.c | 54 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 54 insertions(+) diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c index bb15fd3723fb6..b5198652817ee 100644 --- a/gcc/tree-loop-distribution.c +++ b/gcc/tree-loop-distribution.c @@ -2848,6 +2848,52 @@ fuse_memset_builtins (vec<struct partition *> *partitions) } } +/* Return false if it's profitable to turn the LOOP PARTITION into a builtin + call, and true if it wasn't, changing the PARTITION to PKIND_NORMAL. */ + +static bool +maybe_normalize_partition (class loop *loop, struct partition *partition) +{ + unsigned HOST_WIDE_INT ratio; + + switch (partition->kind) + { + case PKIND_NORMAL: + case PKIND_PARTIAL_MEMSET: + return false; + + case PKIND_MEMSET: + if (integer_zerop (gimple_assign_rhs1 (DR_STMT + (partition->builtin->dst_dr)))) + ratio = CLEAR_RATIO (optimize_loop_for_speed_p (loop)); + else + ratio = SET_RATIO (optimize_loop_for_speed_p (loop)); + break; + + case PKIND_MEMCPY: + case PKIND_MEMMOVE: + ratio = MOVE_RATIO (optimize_loop_for_speed_p (loop)); + break; + + default: + gcc_unreachable (); + } + + tree niters = number_of_latch_executions (loop); + if (niters == NULL_TREE || niters == chrec_dont_know) + return false; + + wide_int minit, maxit; + value_range_kind vrk = determine_value_range (niters, &minit, &maxit); + if (vrk == VR_RANGE && wi::ltu_p (maxit, ratio)) + { + partition->kind = PKIND_NORMAL; + return true; + } + + return false; +} + void loop_distribution::finalize_partitions (class loop *loop, vec<struct partition *> *partitions, @@ -3087,6 +3133,14 @@ loop_distribution::distribute_loop (class loop *loop, vec<gimple *> stmts, } finalize_partitions (loop, &partitions, &alias_ddrs); + { + bool any_changes_p = false; + for (i = 0; partitions.iterate (i, &partition); ++i) + if (maybe_normalize_partition (loop, partition)) + any_changes_p = true; + if (any_changes_p) + finalize_partitions (loop, &partitions, &alias_ddrs); + } /* If there is a reduction in all partitions make sure the last one is not classified for builtin code generation. */ -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer Vim, Vi, Voltei pro Emacs -- GNUlius Caesar