On Tue, 8 Aug 2023, Uros Bizjak wrote: > On Tue, Aug 8, 2023 at 10:07?AM Richard Biener <rguent...@suse.de> wrote: > > > > On Mon, 7 Aug 2023, Uros Bizjak wrote: > > > > > On Mon, Jul 31, 2023 at 11:40?AM Richard Biener <rguent...@suse.de> wrote: > > > > > > > > On Sun, 30 Jul 2023, Uros Bizjak wrote: > > > > > > > > > Also introduce -m[no-]mmxfp-with-sse option to disable trapping V2SF > > > > > named patterns in order to avoid generation of partial vector V4SFmode > > > > > trapping instructions. > > > > > > > > > > The new option is enabled by default, because even with sanitization, > > > > > a small but consistent speed up of 2 to 3% with Polyhedron capacita > > > > > benchmark can be achieved vs. scalar code. > > > > > > > > > > Using -fno-trapping-math improves Polyhedron capacita runtime 8 to 9% > > > > > vs. scalar code. This is what clang does by default, as it defaults > > > > > to -fno-trapping-math. > > > > > > > > I like the new option, note you lack invoke.texi documentation where > > > > I'd also elaborate a bit on the interaction with -fno-trapping-math > > > > and the possible performance impact then NaNs or denormals leak > > > > into the upper halves and cross-reference -mdaz-ftz. > > > > > > The attached doc patch is invoke.texi entry for -mmmxfp-with-sse > > > option. It is written in a way to also cover half-float vectors. WDYT? > > > > "generate trapping floating-point operations" > > > > I'd say "generate floating-point operations that might affect the > > set of floating point status flags", the word "trapping" is IMHO > > misleading. > > Not sure if "set of floating point status flags" is the correct term, > > but it's what the C standard seems to refer to when talking about > > things you get with fegetexceptflag. feraieexcept refers to > > "floating-point exceptions". Unfortunately the -fno-trapping-math > > documentation is similarly confusing (and maybe even wrong, I read > > it to conform to 'non-stop' IEEE arithmetic). > > Thanks for suggesting the right terminology. I think that: > > +@opindex mpartial-vector-math > +@item -mpartial-vector-math > +This option enables GCC to generate floating-point operations that might > +affect the set of floating point status flags on partial vectors, where > +vector elements reside in the low part of the 128-bit SSE register. Unless > +@option{-fno-trapping-math} is specified, the compiler guarantees correct > +behavior by sanitizing all input operands to have zeroes in the unused > +upper part of the vector register. Note that by using built-in functions > +or inline assembly with partial vector arguments, NaNs, denormal or invalid > +values can leak into the upper part of the vector, causing possible > +performance issues when @option{-fno-trapping-math} is in effect. These > +issues can be mitigated by manually sanitizing the upper part of the partial > +vector argument register or by using @option{-mdaz-ftz} to set > +denormals-are-zero (DAZ) flag in the MXCSR register. > > Now explain in adequate detail what the option does. IMO, the > "floating-point operations that might affect the set of floating point > status flags" correctly identifies affected operations, so an example, > as suggested below, is not necessary. > > > I'd maybe give an example of a FP operation that's _not_ affected > > by the flag (copysign?). > > Please note that I have renamed the option to "-mpartial-vector-math" > with a short target-specific description:
Ah yes, that's a less confusing name but then it might suggest that -mno-partial-vector-math would disable all of that, including integer ops, not only the patterns possibly affecting the exception flags? Note I don't have a better suggestion and this is clearly better than the one mentioning mmx. > +partial-vector-math > +Target Var(ix86_partial_vec_math) Init(1) > +Enable floating-point status flags setting SSE vector operations on > partial vectors > > which I think summarises the option (without the word "trapping"). The > same approach will be taken for Float16 operations, so the approach is > not specific to MMX vectors. > > > Otherwise it looks OK to me. > > Thanks, I have attached the RFC V2 patch; I plan to submit a formal > patch later today. Thanks. With AVX512VL there might also be the option to use a mask (with the penalty of a very much larger instruction encoding). Richard.