On Thu, Mar 17, 2022 at 8:50 PM Uros Bizjak <ubiz...@gmail.com> wrote: > > On Thu, Mar 17, 2022 at 8:41 PM Roger Sayle <ro...@nextmovesoftware.com> > wrote: > > > > > > Implementations of the x87 floating point instruction set have always > > had some pretty strange characteristics. For example on the original > > Intel Pentium the FLDPI instruction (to load 3.14159... into a register) > > took 5 cycles, and the FLDZ instruction (to load 0.0) took 2 cycles, > > when a regular FLD (load from memory) took just 1 cycle!? Given that > > back then memory latencies were much lower (relatively) than they are > > today, these instructions were all but useless except when optimizing > > for size (impressively FLDZ/FLDPI require only two bytes). > > > > Such was the world back in 2006 when Uros Bizjak first added support for > > fldz https://gcc.gnu.org/pipermail/gcc-patches/2006-November/202589.html > > and then shortly after sensibly disabled them for !optimize_size with > > https://gcc.gnu.org/pipermail/gcc-patches/2006-November/204405.html > > [which was very expertly reviewed and approved here: > > https://gcc.gnu.org/pipermail/gcc-patches/2006-November/204487.html ] > > > > "And some things that should not have been forgotten were lost. > > History became legend. Legend became myth." -- Lord of the Rings > > > > Alas this vestigial logic still persists in the compiler today, > > so for example on x86_64 for the following function: > > > > double foo(double x) { return x + 0.0; } > > > > generates with -O2 > > > > foo: addsd .LC0(%rip), %xmm0 > > ret > > .LC0: .long 0 > > .long 0 > > > > preferring to read the constant 0.0 from memory [the constant pool], > > except when optimizing for size. With -Os we get: > > > > foo: xorps %xmm1, %xmm1 > > addsd %xmm1, %xmm0 > > ret > > > > Which is not only smaller (the two instructions require seven bytes vs. > > eight for the original addsd from mem, even without considering the > > constant pool) but is also faster on modern hardware. The latter code > > sequence is generated by both clang and msvc with -O2. Indeed Agner > > Fogg documents the set of floating point/SSE constants that it's > > cheaper to materialize than to load from memory. > > > > This patch shuffles the conditions on the i386 backend's *movtf_internal, > > *movdf_internal and *movsf_internal define_insns to untangle the newer > > TARGET_SSE_MATH clauses from the historical standard_80387_constant_p > > conditions. Amongst the benefits of this are that it improves the code > > generated for PR tree-optimization/90356 and resolves PR target/86722. > > Many thanks to Hongtao whose approval of my PR 94680 "movq" patch > > unblocked this one. > > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > > and make -check with no new failures. Ok for mainline? > > > > > > 2022-03-17 Roger Sayle <ro...@nextmovesoftware.com> > > > > gcc/ChangeLog > > PR target/86722 > > PR tree-optimization/90356 > > * config/i386/i386.md (*movtf_internal): Don't guard > > standard_sse_constant_p clause by optimize_function_for_size_p. > > (*movdf_internal): Likewise. > > (*movsf_internal): Likewise. > > > > gcc/testsuite/ChangeLog > > PR target/86722 > > PR tree-optimization/90356 > > * gcc.target/i386/pr86722.c: New test case. > > * gcc.target/i386/pr90356.c: New test case. > > OK, and based on your analysis, even obvious.
Maybe a little improvement for tests: +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2 -msse" } */ You could add "-msse2 -mfpmath=sse" to dg-options, so it will also compile for the ia32 target. Uros.