On 11/28/2013, 7:51 PM, H.J. Lu wrote:
On Thu, Nov 28, 2013 at 2:11 PM, Vladimir Makarov <vmaka...@redhat.com> wrote:
The following patch fixes PR57293
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57293
It is actually an implementation of missed LRA functionality in reg
elimination. Before the patch any explicit change of stack pointer in
RTL resulted in necessity to use the frame pointer.
The patch has practically no effect on generic tuning of x86/x86-64.
But it has a dramatic effect on code performance for other tunings
like corei7 which don't use incoming args accumulation. The maximum
SPEC2000 improvement 2.5% is achieved on x86 SPECInt2000. But
SPECFP2000 rate also has improvement about 1% on x86 and x86-64. Too
bad that I did not implement it at the first place. The results would
have been even much better ones reported on 2012 GNU Cauldron as I
also used -mtune=corei7 that time.
The patch was bootstrapped and tested on x86-64/x86 and ppc.
Committed as rev. 205498.
2013-11-28 Vladimir Makarov<vmaka...@redhat.com>
PR target/57293
* ira.h (ira_setup_eliminable_regset): Remove parameter.
* ira.c (ira_setup_eliminable_regset): Ditto. Add
SUPPORTS_STACK_ALIGNMENT for crtl->stack_realign_needed.
Don't call lra_init_elimination.
(ira): Call ira_setup_eliminable_regset without arguments.
* loop-invariant.c (calculate_loop_reg_pressure): Remove argument
from ira_setup_eliminable_regset call.
* gcse.c (calculate_bb_reg_pressure): Ditto.
* haifa-sched.c (sched_init): Ditto.
* lra.h (lra_init_elimination): Remove the prototype.
* lra-int.h (lra_insn_recog_data): New member sp_offset. Move
used_insn_alternative upper.
(lra_eliminate_regs_1): Add one more parameter.
(lra-eliminate): Ditto.
* lra.c (lra_invalidate_insn_data): Set sp_offset.
(setup_sp_offset): New.
(lra_process_new_insns): Call setup_sp_offset.
(lra): Add argument to lra_eliminate calls.
* lra-constraints.c (get_equiv_substitution): Rename to get_equiv.
(get_equiv_with_elimination): New.
(process_addr_reg): Call get_equiv_with_elimination instead of
get_equiv_substitution.
(equiv_address_substitution): Ditto.
(loc_equivalence_change_p): Ditto.
(loc_equivalence_callback, lra_constraints): Ditto.
(curr_insn_transform): Ditto. Print the sp offset
(process_alt_operands): Prevent stack pointer reloads.
(lra_constraints): Remove one argument from lra_eliminate call.
Move it up. Mark used hard regs bfore it. Use
get_equiv_with_elimination instead of get_equiv_substitution.
* lra-eliminations.c (lra_eliminate_regs_1): Add parameter and
assert for param values combination. Use sp offset. Add argument
to lra_eliminate_regs_1 calls.
(lra_eliminate_regs): Add argument to lra_eliminate_regs_1 call.
(curr_sp_change): New static var.
(mark_not_eliminable): Add parameter. Update curr_sp_change.
Don't prevent elimination to sp if we can calculate its change.
Pass the argument to mark_not_eliminable calls.
(eliminate_regs_in_insn): Add a parameter. Use sp offset. Add
argument to lra_eliminate_regs_1 call.
(update_reg_eliminate): Move calculation of hard regs for spill
lower. Switch off lra_in_progress temporarily to generate regs
involved into elimination.
(lra_init_elimination): Rename to init_elimination. Make it
static. Set up insn sp offset, check the offsets at the end of
BBs.
(process_insn_for_elimination): Add parameter. Pass its value to
eliminate_regs_in_insn.
(lra_eliminate): : Add parameter. Pass its value to
process_insn_for_elimination. Add assert for param values
combination. Call init_elimination. Don't update offsets in
equivalence substitutions.
* lra-spills.c (assign_mem_slot): Don't call lra_eliminate_regs_1
for created stack slot.
(remove_pseudos): Call lra_eliminate_regs_1 before changing memory
onto stack slot.
Hi Vladimir,
Thanks for your hard work. I noticed a few regressions
on x86-64:
FAIL: gcc.dg/guality/pr54519-1.c -O2 line 20 y == 25
FAIL: gcc.dg/guality/pr54519-1.c -O2 line 20 z == 6
FAIL: gcc.dg/guality/pr54519-1.c -O2 line 23 y == 117
FAIL: gcc.dg/guality/pr54519-1.c -O2 line 23 z == 8
FAIL: gcc.dg/guality/pr54519-1.c -O3 -fomit-frame-pointer line 20 x == 36
FAIL: gcc.dg/guality/pr54519-1.c -O3 -fomit-frame-pointer line 20 y == 25
FAIL: gcc.dg/guality/pr54519-1.c -O3 -fomit-frame-pointer line 20 z == 6
FAIL: gcc.dg/guality/pr54519-1.c -O3 -g line 20 x == 36
FAIL: gcc.dg/guality/pr54519-1.c -O3 -g line 20 y == 25
FAIL: gcc.dg/guality/pr54519-1.c -O3 -g line 20 z == 6
FAIL: gcc.dg/guality/pr54519-3.c -O2 -flto -fno-use-linker-plugin
-flto-partition=none line 20 x == 36
FAIL: gcc.dg/guality/pr54519-3.c -O2 -flto -fno-use-linker-plugin
-flto-partition=none line 23 x == 98
FAIL: gcc.dg/guality/pr54519-3.c -O2 line 20 x == 36
FAIL: gcc.dg/guality/pr54519-3.c -O2 line 20 y == 25
FAIL: gcc.dg/guality/pr54519-3.c -O2 line 20 z == 6
FAIL: gcc.dg/guality/pr54519-3.c -O2 line 23 x == 98
FAIL: gcc.dg/guality/pr54519-3.c -O2 line 23 y == 117
FAIL: gcc.dg/guality/pr54519-3.c -O2 line 23 z == 8
FAIL: gcc.dg/guality/pr54519-3.c -O3 -fomit-frame-pointer line 20 x == 36
FAIL: gcc.dg/guality/pr54519-3.c -O3 -fomit-frame-pointer line 20 y == 25
FAIL: gcc.dg/guality/pr54519-3.c -O3 -fomit-frame-pointer line 20 z == 6
FAIL: gcc.dg/guality/pr54519-3.c -O3 -fomit-frame-pointer line 23 x == 98
FAIL: gcc.dg/guality/pr54519-3.c -O3 -fomit-frame-pointer line 23 y == 117
FAIL: gcc.dg/guality/pr54519-3.c -O3 -fomit-frame-pointer line 23 z == 8
FAIL: gcc.dg/guality/pr54519-3.c -O3 -g line 20 x == 36
FAIL: gcc.dg/guality/pr54519-3.c -O3 -g line 20 y == 25
FAIL: gcc.dg/guality/pr54519-3.c -O3 -g line 20 z == 6
FAIL: gcc.dg/guality/pr54519-3.c -O3 -g line 23 x == 98
FAIL: gcc.dg/guality/pr54519-3.c -O3 -g line 23 y == 117
FAIL: gcc.dg/guality/pr54519-3.c -O3 -g line 23 z == 8
FAIL: gcc.dg/guality/pr54693-2.c -O3 -fomit-frame-pointer
-funroll-all-loops -finline-functions line 21 i == v + 1
FAIL: gcc.dg/guality/pr54693-2.c -O3 -fomit-frame-pointer
-funroll-loops line 21 i == v + 1
FAIL: gcc.target/i386/pr9771-1.c (internal compiler error)
Well, I guess it can happen when you don't use frame-pointer anymore.
Besides guality failures, there is
[hjl@gnu-6 gcc]$ make check-gcc
RUNTESTFLAGS="--target_board='unix{-m32\ -march=corei7}'
i386.exp=pr9771-1.c"
make[1]: Entering directory `/export/build/gnu/gcc/build-x86_64-linux/gcc'
test -d plugin || mkdir plugin
test -d testsuite || mkdir testsuite
test -d testsuite/gcc || mkdir testsuite/gcc
(rootme=`${PWDCMD-pwd}`; export rootme; \
srcdir=`cd /export/gnu/import/git/gcc/gcc; ${PWDCMD-pwd}` ; export srcdir ; \
cd testsuite/gcc; \
rm -f tmp-site.exp; \
sed '/set tmpdir/ s|testsuite$|testsuite/gcc|' \
< ../../site.exp > tmp-site.exp; \
/bin/sh ${srcdir}/../move-if-change tmp-site.exp site.exp; \
EXPECT=`if [ -f ${rootme}/../expect/expect ] ; then echo
${rootme}/../expect/expect ; else echo expect ; fi` ; export EXPECT ;
\
if [ -f ${rootme}/../expect/expect ] ; then \
TCL_LIBRARY=`cd .. ; cd ${srcdir}/../tcl/library ; ${PWDCMD-pwd}` ; \
export TCL_LIBRARY ; fi ; \
runtestflags= ; \
if [ -n "" ] ; then \
runtestflags=""; \
elif [ -n "" ] ; then \
parts="`echo ' ' \
| sed 's/=[^ ]* / /g'`"; \
for part in `find $srcdir/testsuite/gcc* -name \*.exp` ; do \
part=`basename $part` ; \
case " $parts $runtestflags " in \
*" $part "*) ;; \
*) runtestflags="$runtestflags $part" ;; \
esac ; \
done ; \
fi ; \
`if [ -f ${srcdir}/../dejagnu/runtest ] ; then echo
${srcdir}/../dejagnu/runtest ; else echo runtest; fi` --tool gcc
--target_board='unix{-m32\ -march=corei7}' i386.exp=pr9771-1.c
$runtestflags)
WARNING: Couldn't find the global config file.
Test Run By hjl on Thu Nov 28 16:50:05 2013
Native configuration is x86_64-unknown-linux-gnu
=== gcc tests ===
Schedule of variations:
unix/-m32 -march=corei7
Running target unix/-m32 -march=corei7
Using /usr/share/dejagnu/baseboards/unix.exp as board description file
for target.
Using /usr/share/dejagnu/config/unix.exp as generic interface file for target.
Using /export/gnu/import/git/gcc/gcc/testsuite/config/default.exp as
tool-and-target-specific interface file.
Running /export/gnu/import/git/gcc/gcc/testsuite/gcc.target/i386/i386.exp ...
FAIL: gcc.target/i386/pr9771-1.c (internal compiler error)
FAIL: gcc.target/i386/pr9771-1.c (test for excess errors)
May be it is wrong to use this test with this tuning as reload also
fails on this test when -march=corei7 is used. Or most probably the
problem is somewhere else. The patch just triggered it.
=== gcc Summary ===
# of expected passes 10
# of unexpected failures 2
# of unresolved testcases 1
"-m32 -march=corei7" triggers the ICE.