[Bug target/17619] Non-optimal code for -mfpmath=387,sse

2004-12-01 Thread bangerth at dealii dot org
--- Additional Comments From bangerth at dealii dot org 2004-12-01 20:59 --- The two spinoffs are PR 18766 and PR 18767. W. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17619

[Bug target/17619] Non-optimal code for -mfpmath=387,sse

2004-12-01 Thread bangerth at dealii dot org
--- Additional Comments From bangerth at dealii dot org 2004-12-01 20:49 --- In reply to comment #6: > Please note, that we should return the result in fp reg, so final flds is > needed in any case. I think, this code is optimal. Almost, or at least I believe so. If we assume that

[Bug target/17619] Non-optimal code for -mfpmath=387,sse

2004-12-01 Thread uros at gcc dot gnu dot org
--- Additional Comments From uros at gcc dot gnu dot org 2004-12-01 16:02 --- If the loop is splitted manually and putting a, b and c inside the foobar() function [otherwise vectorizer complains about unaligned load]: --cut here-- struct X { float array[4]; }; float foobar() { X a,

[Bug target/17619] Non-optimal code for -mfpmath=387,sse

2004-12-01 Thread pinskia at gcc dot gnu dot org
--- Additional Comments From pinskia at gcc dot gnu dot org 2004-12-01 14:27 --- Actually the most optimial code would be: _Z6foobarv: .LFB2: pushl %ebp .LCFI0: movl%esp, %ebp .LCFI1: subl$24, %esp .LCFI2: movaps a, %xmm0 mulps b,

[Bug target/17619] Non-optimal code for -mfpmath=387,sse

2004-12-01 Thread uros at gcc dot gnu dot org
--- Additional Comments From uros at gcc dot gnu dot org 2004-12-01 14:07 --- With "GCC: (GNU) 4.0.0 20041201 (experimental)", following code is produced (without -ffast-math): _Z6foobarv: .LFB2: pushl %ebp .LCFI0: movl %esp, %ebp .LCFI1: subl $4, %esp .LCFI2: