https://gcc.gnu.org/bugzilla/show_bug.cgi?id=28366
--- Comment #3 from Martin Sebor <msebor at gcc dot gnu.org> --- What I meant by suboptimal is the eight vector stores when it seems that just two instructions are needed to save the two vector registers that hold the arguments like Clang does: addi 3, 1, -48 addi 4, 1, -64 stvx 3, 0, 3 stvx 2, 0, 4 addi 3, 1, -32 nop lfs 0, -36(1) lfs 1, -52(1) lfs 2, -40(1) lfs 3, -56(1) lfs 4, -44(1) lfs 5, -60(1) lfs 12, -48(1) lfs 6, -64(1) fdivs 0, 1, 0 fdivs 2, 3, 2 fdivs 13, 5, 4 fdivs 1, 6, 12 stfs 0, -20(1) stfs 2, -24(1) stfs 13, -28(1) stfs 1, -32(1) nop lvx 2, 0, 3 blr .long 0 .quad 0