On Tue, Aug 8, 2023 at 12:08 PM Richard Biener <rguent...@suse.de> wrote:
> > > > > > Also introduce -m[no-]mmxfp-with-sse option to disable trapping V2SF > > > > > > named patterns in order to avoid generation of partial vector > > > > > > V4SFmode > > > > > > trapping instructions. > > > > > > > > > > > > The new option is enabled by default, because even with > > > > > > sanitization, > > > > > > a small but consistent speed up of 2 to 3% with Polyhedron capacita > > > > > > benchmark can be achieved vs. scalar code. > > > > > > > > > > > > Using -fno-trapping-math improves Polyhedron capacita runtime 8 to > > > > > > 9% > > > > > > vs. scalar code. This is what clang does by default, as it defaults > > > > > > to -fno-trapping-math. > > > > > > > > > > I like the new option, note you lack invoke.texi documentation where > > > > > I'd also elaborate a bit on the interaction with -fno-trapping-math > > > > > and the possible performance impact then NaNs or denormals leak > > > > > into the upper halves and cross-reference -mdaz-ftz. > > > > > > > > The attached doc patch is invoke.texi entry for -mmmxfp-with-sse > > > > option. It is written in a way to also cover half-float vectors. WDYT? > > > > > > "generate trapping floating-point operations" > > > > > > I'd say "generate floating-point operations that might affect the > > > set of floating point status flags", the word "trapping" is IMHO > > > misleading. > > > Not sure if "set of floating point status flags" is the correct term, > > > but it's what the C standard seems to refer to when talking about > > > things you get with fegetexceptflag. feraieexcept refers to > > > "floating-point exceptions". Unfortunately the -fno-trapping-math > > > documentation is similarly confusing (and maybe even wrong, I read > > > it to conform to 'non-stop' IEEE arithmetic). > > > > Thanks for suggesting the right terminology. I think that: > > > > +@opindex mpartial-vector-math > > +@item -mpartial-vector-math > > +This option enables GCC to generate floating-point operations that might > > +affect the set of floating point status flags on partial vectors, where > > +vector elements reside in the low part of the 128-bit SSE register. Unless > > +@option{-fno-trapping-math} is specified, the compiler guarantees correct > > +behavior by sanitizing all input operands to have zeroes in the unused > > +upper part of the vector register. Note that by using built-in functions > > +or inline assembly with partial vector arguments, NaNs, denormal or invalid > > +values can leak into the upper part of the vector, causing possible > > +performance issues when @option{-fno-trapping-math} is in effect. These > > +issues can be mitigated by manually sanitizing the upper part of the > > partial > > +vector argument register or by using @option{-mdaz-ftz} to set > > +denormals-are-zero (DAZ) flag in the MXCSR register. > > > > Now explain in adequate detail what the option does. IMO, the > > "floating-point operations that might affect the set of floating point > > status flags" correctly identifies affected operations, so an example, > > as suggested below, is not necessary. > > > > > I'd maybe give an example of a FP operation that's _not_ affected > > > by the flag (copysign?). > > > > Please note that I have renamed the option to "-mpartial-vector-math" > > with a short target-specific description: > > Ah yes, that's a less confusing name but then it might suggest > that -mno-partial-vector-math would disable all of that, including > integer ops, not only the patterns possibly affecting the exception > flags? Note I don't have a better suggestion and this is clearly > better than the one mentioning mmx. You are right, I think I'll rename the option to -mpartial-vector-fp-math. Thanks, Uros.