--- Additional Comments From bangerth at dealii dot org 2004-12-01 20:59
---
The two spinoffs are PR 18766 and PR 18767.
W.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17619
--- Additional Comments From bangerth at dealii dot org 2004-12-01 20:49
---
In reply to comment #6:
> Please note, that we should return the result in fp reg, so final flds is
> needed in any case. I think, this code is optimal.
Almost, or at least I believe so. If we assume that
--- Additional Comments From uros at gcc dot gnu dot org 2004-12-01 16:02
---
If the loop is splitted manually and putting a, b and c inside the foobar()
function [otherwise vectorizer complains about unaligned load]:
--cut here--
struct X
{
float array[4];
};
float foobar()
{
X a,
--- Additional Comments From pinskia at gcc dot gnu dot org 2004-12-01
14:27 ---
Actually the most optimial code would be:
_Z6foobarv:
.LFB2:
pushl %ebp
.LCFI0:
movl%esp, %ebp
.LCFI1:
subl$24, %esp
.LCFI2:
movaps a, %xmm0
mulps b,
--- Additional Comments From uros at gcc dot gnu dot org 2004-12-01 14:07
---
With "GCC: (GNU) 4.0.0 20041201 (experimental)", following code is produced
(without -ffast-math):
_Z6foobarv:
.LFB2:
pushl %ebp
.LCFI0:
movl %esp, %ebp
.LCFI1:
subl $4, %esp
.LCFI2: