Hi all, This patch is a consequence of the thread I started at https://gcc.gnu.org/ml/gcc/2016-01/msg00100.html The problem is that the fold_rtx call in cse_insn may overwrite its argument if the insn argument is non-NULL. This leads to CSE not considering the original form of the RTX when doing its cost analysis later on. This led to it picking a normal SImode multiply expression over the original multiply-sign-extend expression which in my case is cheaper (as reflected in the fixed rtx costs from patch 2)
The simple fix is to pass NULL to fold_rtx so that it will return the candidate folded expression into src_folded but still retain the original src for analysis. With this change the gcc.target/arm/wmul-[12].c and the costs fix in patch [2/4] the tests now generate their expected sign-extend+multiply (+accumulate) sequences. Apart from that this patch has no impact codegen on SPEC2006 for arm. For aarch64 the impact is minimal and inconsequential. I've seen sequences that select between 1 -1 being turned from a CSINC (of zero) into a CSNEG. Both are valid and of equal value. On x86_64 the impact was also minimal. Most benchmarks were not changed at all. Some showed a negligible reduction in codesize and a slight register-allocation perturbations. But nothing significant. Hence, I claim that this patch is low impact. Bootstrapped and tested on arm, aarch64, x86_64. Ok for trunk? Thanks, Kyrill 2016-01-22 Kyrylo Tkachov <kyrylo.tkac...@arm.com> * cse.c (cse_insn): Pass NULL to fold_rtx when initially folding the source of a SET.
diff --git a/gcc/cse.c b/gcc/cse.c index 58b8fc0313dcbfb2036054564746a7832ae52140..2665d9a2733cad7286b41a88753acfcf79be83f1 100644 --- a/gcc/cse.c +++ b/gcc/cse.c @@ -4636,7 +4636,7 @@ cse_insn (rtx_insn *insn) /* Simplify and foldable subexpressions in SRC. Then get the fully- simplified result, which may not necessarily be valid. */ - src_folded = fold_rtx (src, insn); + src_folded = fold_rtx (src, NULL); #if 0 /* ??? This caused bad code to be generated for the m68k port with -O2.