On 7/21/23 11:31, Palmer Dabbelt wrote:
IIUC the pattern to emit fmv suffers from the same bug -- it's fixed
in the same
way, but I think we might be able to come up with a test for it:
`fmv.d.x FREG,
x0` would be the fastest way to generate 0.0, so maybe something like
double sum(double *d) {
double sum = 0;
for (int i = 0; i < 8; ++i)
sum += d[i];
return sum;
}
would do it? That's generating the fmv on 13 for me, though, so maybe
I'm
missing something?`
I don't think we can avoid FMV in this case
fmv.d.x fa0,zero #1
addi a5,a0,64
.L2:
fld fa5,0(a0)
addi a0,a0,8
fadd.d fa0,fa0,fa5 #2
bne a0,a5,.L2
ret
In #1, the zero needs to be setup in FP reg (possible using FMV), since
in #2 it will be used for FP math.
If we change ur test slightly,
double zadd(double *d) {
double sum = 0.0;
for (int i = 0; i < 8; ++i)
d[i] = sum;
return sum;
}
We still get the optimal code for writing to FP 0. The last FMV is
unavoidable as we need an FP return reg.
addi a5,a0,64
.L2:
sd zero,0(a0)
addi a0,a0,8
bne a0,a5,.L2
fmv.d.x fa0,zero
ret