On 7/21/23 11:31, Palmer Dabbelt wrote:

IIUC the pattern to emit fmv suffers from the same bug -- it's fixed in the same way, but I think we might be able to come up with a test for it: `fmv.d.x FREG,
x0` would be the fastest way to generate 0.0, so maybe something like

   double sum(double *d) {
     double sum = 0;
     for (int i = 0; i < 8; ++i)
       sum += d[i];
     return sum;
   }

would do it?  That's generating the fmv on 13 for me, though, so maybe I'm missing something?`

I don't think we can avoid FMV in this case

    fmv.d.x    fa0,zero     #1
    addi    a5,a0,64
.L2:
    fld    fa5,0(a0)
    addi    a0,a0,8
    fadd.d    fa0,fa0,fa5   #2
    bne    a0,a5,.L2
    ret

In #1, the zero needs to be setup in FP reg (possible using FMV), since in #2 it will be used for FP math.

If we change ur test slightly,

double zadd(double *d) {
     double sum = 0.0;
     for (int i = 0; i < 8; ++i)
       d[i] = sum;
     return sum;
}

We still get the optimal code for writing to FP 0. The last FMV is unavoidable as we need an FP return reg.


    addi    a5,a0,64
.L2:
    sd    zero,0(a0)
    addi    a0,a0,8
    bne    a0,a5,.L2
    fmv.d.x    fa0,zero
    ret

Reply via email to