Hi! I create test to reproduce issue with cpu2006/454.calculix See attached. File e_c3d.f contains cutted subroutine from calculix. tr535.f main entry point of the test. you can use go-script as a reference how i get these results. find_stall.pl script which find problem instruction combinations.
Problem that new compiler generates read instruction right after
write. See some dumps below.
This is inner cycle near line #42 generated by rev. 119759 compiler
.L13:
.LBB22:
.loc 1 42 0
movapd %xmm2, %xmm0
leaq (%rdx,%rbx), %rax
.loc 1 38 0
addl $1, %edi
addq $24, %rdx
.loc 1 42 0
mulsd 72(%rcx), %xmm0
.loc 1 38 0
addq $72, %rcx
cmpl $4, %edi
.loc 1 42 0
mulsd %xmm3, %xmm0
mulsd -8(%rax,%r9,8), %xmm0
mulsd %xmm4, %xmm0
addsd %xmm0, %xmm1
.loc 1 38 0
jne .L13
This is for line 42 generated by rev. 119760 compiler
.L13:
.LBB23:
.loc 1 42 0
movsd 72(%rdx), %xmm0
movq 80(%rsp), %rax
addq $72, %rdx
mulsd -8(%r9,%r15,8), %xmm0
addq %rdi, %rax
addq $24, %rdi
.loc 1 38 0
cmpq $72, %rdi
.loc 1 42 0
mulsd -8(%r11,%r14,8), %xmm0
mulsd -8(%rax,%r13,8), %xmm0
movq 440(%rsp), %rax
mulsd (%rax), %xmm0
addsd (%rsi,%r10,8), %xmm0 <-|
movsd %xmm0, (%rsi,%r10,8) <-+- problems
.loc 1 38 0
jne .L13
My output is:
real 0m3.781s
user 0m3.776s
sys 0m0.004s
real 0m5.956s
user 0m5.948s
sys 0m0.004s
hey... we are going
hey... we are going
Line 31
addsd (%rsi,%r10,8), %xmm0
movsd %xmm0, (%rsi,%r10,8)
Line 42
addsd (%rsi,%r10,8), %xmm0
movsd %xmm0, (%rsi,%r10,8)
Feel free to ask if any problems with reproducing occurs.
-Vladimir
------
* From: Grigory Zagorodnev <grigory_zagorodnev at linux dot intel dot com>
* To: gcc at gcc dot gnu dot org, dnovillo at redhat dot com
* Cc: "H. J. Lu" <hjl at lucon dot org>
* Date: Mon, 15 Jan 2007 17:59:31 +0300
* Subject: 27% regression of gcc 4.3 performance on cpu2k6/calculix
Hi!
There is a huge regression of gcc 4.3 performance detected on
cpu2006/454.calculix benchmark at -O2 optimization level on
x86_64-redhat-linux.
Regression is caused by mem-ssa merge 12/12/2006 (revision 119760).
http://gcc.gnu.org/viewcvs?view=rev&revision=119760
PS: I'm trying to get a small reproducer
- Grigory
test_calculix.tar.bz2
Description: BZip2 compressed data
