https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113679
Jakub Jelinek <jakub at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jakub at gcc dot gnu.org --- Comment #7 from Jakub Jelinek <jakub at gcc dot gnu.org> --- And while SSE/SSE2 has instructions for performing arithmetics in IEEE754 single and double formats, x87 does not, everything is done in extended precision (unless the FPU is configured to use smaller precision but then it doesn't support the extended precision long double on the other side) and conversions to IEEE754 single/double have to be done when storing the extended precision registers into memory. So, it is impossible to achieve the expected IEEE754 single and double arithmetics behavior, one can get only something close to it (but with double rounding problems) if all the temporaries are immediately stored into memory and loaded from it again. The -ffloat-store option does it to a limited extent (doesn't convert everything though), but still, the performance is terrible. C allows extended precision and specifies how to should behave, that is the -fexcess-precision=standard model (e.g. enabled by default for -std=c{99,11,...} options as opposed to -std=gnu..., then it is consistently using the excess precision with some casts/assignments mandating rounding to lower precisions, while -fexcess-precision=fast is what gcc has been implementing before it has been introduced, excess precision is used there as long as something is kept in the FPU registers and conversions are done when it needs to be spilled to memory.