Following patch has exposed an optimization shortcoming:
2005-07-12 Dale Johannesen <[EMAIL PROTECTED]>
* expr.c (compress_float_constant): Add cost check.
* config/rs6000.c (rs6000_rtx_cost): Adjust FLOAT_EXTEND cost.
This patch results in generating worse code for the following test case:
1) Test case:
struct S {
float d1, d2, d3;
};
S ms()
{
struct S s = {0,0,0};
return s;
}
With: -O1 -mdynamic-no-pic -march=pentium4 -mtune=prescott, gcc now
generates
pxor %xmm0, %xmm0
movsd %xmm0, (%eax)
...
Instead of:
movl $0, (%eax)
movl $0, 4(%eax)
....
This is because change to compress_float_constant has changed the RTL
pattern which cse cannot optimize:
Before above patch compress_float_constant generated:
(insn 12 7 13 0 (set (reg:SF 59)
(mem/u/i:SF (symbol_ref/u:SI ("*LC0") [flags 0x2]) [0 S4
A32])) -1 (nil)
(nil))
(insn 13 12 14 0 (set (mem/s/j:DF (reg/f:SI 58 [ D.1929 ]) [0
<result>.d1+0 S8 A32])
(float_extend:DF (reg:SF 59))) -1 (nil)
(nil))
Which cse was then able to constant propagate double 0.0, resulting
in the following pattern:
(insn 13 7 15 0 (set (mem/s/j:DF (reg/f:SI 58 [ D.1929 ]) [0
<result>.d1+0 S8 A32])
(const_double:DF 0.0 [0x0.0p+0])) 64 {*movdf_nointeger} (nil)
(nil))
With the latest gcc (which includes above patch):
compress_float_constant's new cost computation disallows generation
of float_extend:DF. cse is then faced with the new pattern:
(insn 12 11 13 0 s.C:7 (set (reg:DF 59)
(mem/u/i:DF (symbol_ref/u:SI ("*LC0") [flags 0x2]) [0 S8
A64])) -1 (nil)
(nil))
(insn 13 12 14 0 s.C:7 (set (mem/s/j:DF (reg/f:SI 58 [ D.1929 ]) [0
<result>.d1+0 S8 A32])
(reg:DF 59)) -1 (nil)
(nil))
As soon as it sees a REG node as source, it gives up.
What is the right way to restore this optimization again:
1) Can cse be taught to constant propagate when source is a REG rtl?
Why this was never attempted before. Fixing up cse seems to fix both
the double float as well as single float case (which is not impacted
by above patch).
2) For the double float case (as illustrated by above test case), can
we twik compress_float_constant to not use the cost computation of
RHS when LHS is store to a memory. This fixes the performance
regressions and caused no regression. Attached patch is what I tried.
3) Any other approrach?
- Thanks, fariborz
Index: expr.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/expr.c,v
retrieving revision 1.778.4.5
diff -c -p -r1.778.4.5 expr.c
*** expr.c 13 Jul 2005 01:07:47 -0000 1.778.4.5
--- expr.c 10 Aug 2005 18:55:49 -0000
*************** compress_float_constant (rtx x, rtx y)
*** 3187,3196 ****
the extension. */
if (! (*insn_data[ic].operand[1].predicate) (trunc_y,
srcmode))
continue;
! /* This is valid, but may not be cheaper than the original. */
! newcost = rtx_cost (gen_rtx_FLOAT_EXTEND (dstmode,
trunc_y), SET);
! if (oldcost < newcost)
! continue;
}
else if (float_extend_from_mem[dstmode][srcmode])
{
--- 3187,3199 ----
the extension. */
if (! (*insn_data[ic].operand[1].predicate) (trunc_y,
srcmode))
continue;
! if (!MEM_P (x))
! {
! /* This is valid, but may not be cheaper than the
original. */
! newcost = rtx_cost (gen_rtx_FLOAT_EXTEND (dstmode,
trunc_y), SET);
! if (oldcost < newcost && (!MEM_P (x)) )
! continue;
! }
}
else if (float_extend_from_mem[dstmode][srcmode])
{