On Mon, Jul 31, 2023 at 11:40 AM Richard Biener <rguent...@suse.de> wrote: > > On Sun, 30 Jul 2023, Uros Bizjak wrote: > > > Also introduce -m[no-]mmxfp-with-sse option to disable trapping V2SF > > named patterns in order to avoid generation of partial vector V4SFmode > > trapping instructions. > > > > The new option is enabled by default, because even with sanitization, > > a small but consistent speed up of 2 to 3% with Polyhedron capacita > > benchmark can be achieved vs. scalar code. > > > > Using -fno-trapping-math improves Polyhedron capacita runtime 8 to 9% > > vs. scalar code. This is what clang does by default, as it defaults > > to -fno-trapping-math. > > I like the new option, note you lack invoke.texi documentation where > I'd also elaborate a bit on the interaction with -fno-trapping-math > and the possible performance impact then NaNs or denormals leak > into the upper halves and cross-reference -mdaz-ftz.
Yes, this is my plan (lack of documentation is due to RFC status of the patch). OTOH, Hongtao has some other ideas in the PR, so I'll wait with the patch a bit. Thanks, Uros. > Thanks, > Richard. > > > PR target/110832 > > > > gcc/ChangeLog: > > > > * config/i386/i386.h (TARGET_MMXFP_WITH_SSE): New macro. > > * config/i386/i386/opt (mmmxfp-with-sse): New option. > > * config/i386/mmx.md (movq_<mode>_to_sse): Do not sanitize > > upper part of V2SFmode register with -fno-trapping-math. > > (<plusminusmult:insn>v2sf3): Enable for TARGET_MMXFP_WITH_SSE. > > (divv2sf3): Ditto. > > (<smaxmin:code>v2sf3): Ditto. > > (sqrtv2sf2): Ditto. > > (*mmx_haddv2sf3_low): Ditto. > > (*mmx_hsubv2sf3_low): Ditto. > > (vec_addsubv2sf3): Ditto. > > (vec_cmpv2sfv2si): Ditto. > > (vcond<V2FI:mode>v2sf): Ditto. > > (fmav2sf4): Ditto. > > (fmsv2sf4): Ditto. > > (fnmav2sf4): Ditto. > > (fnmsv2sf4): Ditto. > > (fix_truncv2sfv2si2): Ditto. > > (fixuns_truncv2sfv2si2): Ditto. > > (floatv2siv2sf2): Ditto. > > (floatunsv2siv2sf2): Ditto. > > (nearbyintv2sf2): Ditto. > > (rintv2sf2): Ditto. > > (lrintv2sfv2si2): Ditto. > > (ceilv2sf2): Ditto. > > (lceilv2sfv2si2): Ditto. > > (floorv2sf2): Ditto. > > (lfloorv2sfv2si2): Ditto. > > (btruncv2sf2): Ditto. > > (roundv2sf2): Ditto. > > (lroundv2sfv2si2): Ditto. > > > > Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. > > > > Uros. > > > > -- > Richard Biener <rguent...@suse.de> > SUSE Software Solutions Germany GmbH, > Frankenstrasse 146, 90461 Nuernberg, Germany; > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)