--- Comment #21 from ubizjak at gmail dot com 2007-04-06 07:37 ---
Strange things happen.
I have fully removed gcc build directory and bootstrapped gcc from scratch. To
my suprise, the difference with -msse and without -msse is now gone and
optimized dumps are now the same. For referenc
--- Comment #20 from ubizjak at gmail dot com 2007-04-05 19:39 ---
(In reply to comment #19)
> what are you using for a compiler? Im using a mainline from mid march, and
gcc version 4.3.0 20070404 (experimental) on i686-pc-linux-gnu
with
> it, my .optimized files diff exactly the same
--- Comment #19 from amacleod at redhat dot com 2007-04-05 17:24 ---
what are you using for a compiler? Im using a mainline from mid march, and with
it, my .optimized files diff exactly the same, and I get the aforementioned
time differences in the executables.
(sse.c and sse-bad.c are s
--- Comment #18 from ubizjak at gmail dot com 2007-04-05 16:39 ---
(In reply to comment #17)
> Is the output from .optimized different? (once the ssa versions numbers have
> been stripped). Those PHIs should be irrelevant, the question is whether the
> different versioning has any eff
--- Comment #17 from amacleod at redhat dot com 2007-04-05 14:23 ---
Is the output from .optimized different? (once the ssa versions numbers have
been stripped). Those PHIs should be irrelevant, the question is whether the
different versioning has any effect.
The only way I can t
--- Comment #16 from dnovillo at redhat dot com 2007-04-05 13:15 ---
Subject: Re: Floating point computation far slower
for -mfpmath=sse
bonzini at gnu dot org wrote on 04/05/07 08:03:
> Is there a way to ensure ordering of PHI functions unlike what Uros's
> dumps suggest?
No.
I al
--- Comment #15 from bonzini at gnu dot org 2007-04-05 13:03 ---
Transformations do not, but out-of-SSA could. Is there a way to ensure
ordering of PHI functions unlike what Uros's dumps suggest?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19780
--- Comment #14 from dnovillo at gcc dot gnu dot org 2007-04-05 12:49
---
(In reply to comment #11)
> So, why does SSA pass have to interfere with computation dataflow? This
> interferece makes things worse and effectively takes away user's control on
> the
> flow of data.
>
Huh? H
--- Comment #12 from ubizjak at gmail dot com 2007-04-05 11:00 ---
(In reply to comment #11)
> with -msse compile flag. Note different variable suffixes that create
> different
> sort order. This is (IMO) due to fact that -msse enables lots of additional
> __builtin functions (these ca
--- Comment #11 from ubizjak at gmail dot com 2007-04-05 10:58 ---
(In reply to comment #10)
> I would look at the lreg output, which contains the results of regclass.
No, the difference is due to ssa pass that generates:
# v1z_10 = PHI
# v1y_9 = PHI
# v1x_8 = PHI
# i_7 = PH
--- Comment #13 from bonzini at gnu dot org 2007-04-05 11:01 ---
So this is an unstable sorting. Adding dnovillo.
--
bonzini at gnu dot org changed:
What|Removed |Added
--- Comment #10 from bonzini at gnu dot org 2007-04-03 13:36 ---
I would look at the lreg output, which contains the results of regclass.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19780
--- Comment #9 from ubizjak at gmail dot com 2007-04-03 13:32 ---
(In reply to comment #8)
> what's the generated code for -ffast-math? in principle i don't see a reason
> why it should make any difference...
Trying to answer your question, I have played a bit with compile flags and
thi
--- Comment #8 from bonzini at gnu dot org 2007-04-03 12:43 ---
what's the generated code for -ffast-math? in principle i don't see a reason
why it should make any difference...
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19780
--- Comment #7 from uros at kss-loka dot si 2006-10-25 12:18 ---
(In reply to comment #6)
> On Xeon 3.6, SSE is now faster:
... but for -ffast-math:
SSE: user0m0.756s
x87: user0m0.612s
Yes, x87 is faster for -ffast-math by some 20%.
--
http://gcc.gnu.org/bugzilla/show_bu
--- Comment #6 from uros at kss-loka dot si 2006-10-25 12:04 ---
(In reply to comment #5)
> With more registers (x86_64) the stack moves are gone, but: (!)
> (testing done on AMD Athlon fam 15 model 35 stepping 2)
On Xeon 3.6, SSE is now faster:
gcc -O2 -march=pentium4 -mfpmath=387 pr
--- Comment #5 from rguenth at gcc dot gnu dot org 2006-10-24 13:28 ---
With more registers (x86_64) the stack moves are gone, but: (!)
[EMAIL PROTECTED]:/abuild/rguenther/trunk-g/gcc> ./xgcc -B. -O2 -o t t.c
-mfpmath=387
[EMAIL PROTECTED]:/abuild/rguenther/trunk-g/gcc> /usr/bin/time ./
--- Comment #4 from bonzini at gnu dot org 2006-08-11 10:22 ---
Except that PPC uses 12 registers f0 f6 f7 f8 f9 f10 f11 f12 f13 f29 f30 f31.
Not that we can blame GCC for using 12, but it is not a fair comparison. :-)
In fact, 8 registers are enough, but it is quite tricky to obtain t
--- Additional Comments From pinskia at gcc dot gnu dot org 2005-09-29
04:05 ---
Confirmed. This is weird and this is an ra issue. I don't understand why the
ra is spilling it to the stack
as there are enough SSE registers to hold the 6 registers.
--
What|Removed
--- Additional Comments From pinskia at gcc dot gnu dot org 2005-09-29
04:06 ---
Oh, and this looks very related to two operand instructions issue.
PPC gives optimial code:
L2:
fmul f0,f6,f9
fmul f13,f7,f10
fmul f12,f8,f11
fmsub f29,f8,f10,f0
fmsub
20 matches
Mail list logo