On Fri, Jan 13, 2017 at 12:43 AM, Nicolai Hähnle <nhaeh...@gmail.com> wrote: > On 13.01.2017 00:20, Ilia Mirkin wrote: >> >> On Thu, Jan 12, 2017 at 6:04 PM, Nicolai Hähnle <nhaeh...@gmail.com> >> wrote: >>> >>> On 12.01.2017 23:46, Ilia Mirkin wrote: >>>> >>>> >>>> On Thu, Jan 12, 2017 at 4:03 PM, Matteo Bruni <matteo.myst...@gmail.com> >>>> wrote: >>>>> >>>>> >>>>> So, what would be really nice to have is a GLSL extension for some >>>>> kind of switch to select the requested behavior WRT NaN. For example a >>>>> three-way option with "don't generate NaN in arithmetic operations", >>>>> "do generate NaN" and "don't care". It could also be a GL state if >>>>> that's easier to implement with the existing hardware, since an >>>>> individual application isn't supposed to require different behavior >>>>> from one shader to the next. >>>>> >>>>> Is anyone interested in / favorable to something like this? It would >>>>> solve the issue with defining NaN behavior in GLSL while making things >>>>> a bit more compatible with "other API a lot of games are ported from >>>>> which happens to be supported by all the desktop GPUs". >>>> >>>> >>>> >>>> Not that I'm biased, but on the NVIDIA Tesla series (G80-GT21x), this >>>> enable is handled via a global flag, not in the shader binary, so this >>>> is all-or-nothing for a whole pipeline. On GF100+, I believe there is >>>> also an enable via a global flag, but there are also a FMUL.FMZ (and >>>> FFMA.FMZ) flag, which I *think* has the same effect. So for GF100+ hw, >>>> this could be done at the instruction level. >>> >>> >>> >>> Well, I would also have advocated for what is effectively a >>> per-program/pipeline flag anyway, even though GCN hardware can >>> theoretically >>> do it per-instruction. Tracking a per-instruction bit in the compiler >>> quickly becomes fragile (e.g. there's no good way for us to model this >>> information per-instruction in LLVM IR). Per-shader isn't any better than >>> per-instruction due to linking, and per-shader-stage is awkward if we >>> ever >>> want to do fancier cross-stage optimizations. >>> >>> It's really quite simple. Introduce an extension with a name like >>> MESA_shader_float_dx9. The behavior I'd suggest is: >>> >>> Enabling/requiring the extension in a shader causes various semantics >>> changes to bring floating point behavior in line with DX9 in that >>> shader's >>> code: >>> >>> - 0*x = 0 >> >> >> Yes. But only for fp32, not for fp64. >> >>> - sqrt/rsqrt are guaranteed to take the absolute value of their argument >> >> >> Is that necessary? If the software knows about the ext, it also knows >> to stick the abs() in. >> >>> - anything else? >> >> >> I'd say MESA_shader_float_zero_wins or something, if we don't stick >> the sqrt stuff into it. > > > Well, I don't know the intricacies of DX9. I agree that apps can do the > abs() themselves, so if the 0*x behavior is really the only other > difference, then zero_wins is a fine name as well. > > Cheers, > Nicolai > > >> >> Here is a software model of the Tesla-era shader execution created by >> Marcin (mwk): >> >> https://github.com/envytools/envytools/blob/master/nvhw/fp.c#L168 >> >> The bit in question is "zero_wins", so just look at what that >> modifies. So if you have a*b and a || b == 0 (i.e. they are +0 or -0) >> then +0 is returned. >> >>> >>> It is a link error to link a program in which some shaders have the >>> extension and others don't. >>> >>> There's funny interactions like having compute shaders with dx9 float >>> semantics, but that's pretty much it :) >>> >>> Somebody just needs to write up a draft, but it only makes sense if we >>> can >>> at least get all the Mesa drivers and Wine behind this. If there's a >>> simple >>> global flag on NVidia hardware, then it should be easy to provide an >>> initial >>> implementation for nouveau *hint* *hint* ;) >>> >>> For radeonsi, more work is required (like support in LLVM). >> >> >> All sounds good to me. Should be relatively straightforward for nouveau.
Wine can also do what Nine does. Its RSQ implementation is: min(FLT_MAX, rsq(abs(x))) That min() expression gets rid of +inf and also NaNs, because a non-NaN number wins (here FLT_MAX) on radeonsi. Marek _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev