On Mon, Jul 31, 2023 at 11:40 AM Richard Biener <rguent...@suse.de> wrote:
>
> On Sun, 30 Jul 2023, Uros Bizjak wrote:
>
> > Also introduce -m[no-]mmxfp-with-sse option to disable trapping V2SF
> > named patterns in order to avoid generation of partial vector V4SFmode
> > trapping instructions.
> >
> > The new option is enabled by default, because even with sanitization,
> > a small but consistent speed up of 2 to 3% with Polyhedron capacita
> > benchmark can be achieved vs. scalar code.
> >
> > Using -fno-trapping-math improves Polyhedron capacita runtime 8 to 9%
> > vs. scalar code.  This is what clang does by default, as it defaults
> > to -fno-trapping-math.
>
> I like the new option, note you lack invoke.texi documentation where
> I'd also elaborate a bit on the interaction with -fno-trapping-math
> and the possible performance impact then NaNs or denormals leak
> into the upper halves and cross-reference -mdaz-ftz.

Yes, this is my plan (lack of documentation is due to RFC status of
the patch). OTOH, Hongtao has some other ideas in the PR, so I'll wait
with the patch a bit.

Thanks,
Uros.

> Thanks,
> Richard.
>
> >     PR target/110832
> >
> > gcc/ChangeLog:
> >
> >     * config/i386/i386.h (TARGET_MMXFP_WITH_SSE): New macro.
> >     * config/i386/i386/opt (mmmxfp-with-sse): New option.
> >     * config/i386/mmx.md (movq_<mode>_to_sse): Do not sanitize
> >     upper part of V2SFmode register with -fno-trapping-math.
> >     (<plusminusmult:insn>v2sf3): Enable for TARGET_MMXFP_WITH_SSE.
> >     (divv2sf3): Ditto.
> >     (<smaxmin:code>v2sf3): Ditto.
> >     (sqrtv2sf2): Ditto.
> >     (*mmx_haddv2sf3_low): Ditto.
> >     (*mmx_hsubv2sf3_low): Ditto.
> >     (vec_addsubv2sf3): Ditto.
> >     (vec_cmpv2sfv2si): Ditto.
> >     (vcond<V2FI:mode>v2sf): Ditto.
> >     (fmav2sf4): Ditto.
> >     (fmsv2sf4): Ditto.
> >     (fnmav2sf4): Ditto.
> >     (fnmsv2sf4): Ditto.
> >     (fix_truncv2sfv2si2): Ditto.
> >     (fixuns_truncv2sfv2si2): Ditto.
> >     (floatv2siv2sf2): Ditto.
> >     (floatunsv2siv2sf2): Ditto.
> >     (nearbyintv2sf2): Ditto.
> >     (rintv2sf2): Ditto.
> >     (lrintv2sfv2si2): Ditto.
> >     (ceilv2sf2): Ditto.
> >     (lceilv2sfv2si2): Ditto.
> >     (floorv2sf2): Ditto.
> >     (lfloorv2sfv2si2): Ditto.
> >     (btruncv2sf2): Ditto.
> >     (roundv2sf2): Ditto.
> >     (lroundv2sfv2si2): Ditto.
> >
> > Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
> >
> > Uros.
> >
>
> --
> Richard Biener <rguent...@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to