https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124434

--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
The loop at -O0:
```
.L3:
        fldt    -16(%rbp) // load into x87 stack
        fldt    -48(%rbp) // load into x87 stack
        fmulp   %st, %st(1) //multiply
        fldt    -64(%rbp) // load into x87 stack
        faddp   %st, %st(1) // add
        fstpt   -16(%rbp) // store from x87 stack
        addl    $1, -20(%rbp)
.L2:
        cmpl    $999999999, -20(%rbp)
        jle     .L3
```

Nothing is kept in the x87 stack which slows down x87 in general as the
transfering between the x87 stack and the normal stack is slow and there is no
load bypass.

at -O1:
.L2:
        fmul    %st(1), %st
        fadd    %st(2), %st
        subl    $1, %eax
        jne     .L2

Everything is kept on the x87 stack.

Reply via email to