On Fri, 2016-02-26 at 16:28 +0100, Roland Scheidegger wrote: > Am 26.02.2016 um 11:25 schrieb Iago Toral: > > > > On Thu, 2016-02-25 at 18:20 -0500, Ilia Mirkin wrote: > >> On Thu, Feb 25, 2016 at 6:16 PM, Francisco Jerez <curroje...@riseup.net> > >> wrote: > >>> Ian Romanick <i...@freedesktop.org> writes: > >>> > >>>> On 02/25/2016 12:13 PM, Francisco Jerez wrote: > >>>>> Ian Romanick <i...@freedesktop.org> writes: > >>>>> > >>>>>> On 02/25/2016 08:46 AM, Roland Scheidegger wrote: > >>>>>>> Am 25.02.2016 um 11:15 schrieb Iago Toral Quiroga: > >>>>>>>> From the OpenGL 4.2 spec: > >>>>>>>> > >>>>>>>> "When a constructor is used to convert any integer or floating-point > >>>>>>>> type to a > >>>>>>>> bool, 0 and 0.0 are converted to false, and non-zero values are > >>>>>>>> converted to > >>>>>>>> true." > >>>>>>>> > >>>>>>>> Thus, even the smallest non-zero floating value should be translated > >>>>>>>> to true. > >>>>>>>> This behavior has been verified also with proprietary NVIDIA drivers. > >>>>>>>> > >>>>>>>> Currently, we implement this conversion as a cmp.nz operation with > >>>>>>>> floats, > >>>>>>>> subject to floating-point precision limitations, and as a result, > >>>>>>>> relatively > >>>>>>>> small non-zero floating point numbers return false instead of true. > >>>>>>>> > >>>>>>>> This patch fixes the problem by getting rid of the sign bit (to > >>>>>>>> cover the case > >>>>>>>> of -0.0) and testing the result against 0u using an integer > >>>>>>>> comparison instead. > >>>>>>>> --- > >>>>>>>> src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 15 ++++++++++++--- > >>>>>>>> 1 file changed, 12 insertions(+), 3 deletions(-) > >>>>>>>> > >>>>>>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > >>>>>>>> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > >>>>>>>> index db20c71..7d62d7e 100644 > >>>>>>>> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > >>>>>>>> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > >>>>>>>> @@ -913,9 +913,18 @@ fs_visitor::nir_emit_alu(const fs_builder &bld, > >>>>>>>> nir_alu_instr *instr) > >>>>>>>> bld.MOV(result, negate(op[0])); > >>>>>>>> break; > >>>>>>>> > >>>>>>>> - case nir_op_f2b: > >>>>>>>> - bld.CMP(result, op[0], brw_imm_f(0.0f), BRW_CONDITIONAL_NZ); > >>>>>>>> - break; > >>>>>>>> + case nir_op_f2b: { > >>>>>>>> + /* Because comparing to 0.0f is subject to precision > >>>>>>>> limitations, do the > >>>>>>>> + * comparison using integers (we need to get rid of the sign > >>>>>>>> bit for that) > >>>>>>>> + */ > >>>>>>>> + if (devinfo->gen >= 8) > >>>>>>>> + op[0] = resolve_source_modifiers(op[0]); > >>>>>>>> + op[0] = retype(op[0], BRW_REGISTER_TYPE_UD); > >>>>>>>> + bld.AND(op[0], op[0], brw_imm_ud(0x7FFFFFFFu)); > >>>>>>>> + bld.CMP(result, op[0], brw_imm_ud(0u), BRW_CONDITIONAL_NZ); > >>>>>>>> + break; > >>>>>>>> + } > >>>>>>>> + > >>>>>>>> case nir_op_i2b: > >>>>>>>> bld.CMP(result, op[0], brw_imm_d(0), BRW_CONDITIONAL_NZ); > >>>>>>>> break; > >>>>>>>> > >>>>>>> > >>>>>>> Does that fix anything? I don't really see a problem with the existing > >>>>>>> logic. Yes any "non-zero value" should be converted to true. But > >>>>>>> surely > >>>>>>> that definition cannot include denorms, which you are always allowed > >>>>>>> to > >>>>>>> flush to zero. > >>>>>>> (Albeit I can't tell what the result would be with NaNs with the float > >>>>>>> compare, nor what the result actually should be in this case since > >>>>>>> glsl > >>>>>>> doesn't require NaNs neither.) > >>>>>> > >>>>>> Based on this and Jason's comments, I think we need a bunch of new > >>>>>> tests. > >>>>>> > >>>>>> - smallest positive normal number > >>>>>> - abs of smallest positive normal number > >>>>>> - neg of " " " " > >>>>>> - largest positive subnormal number > >>>>>> - abs of largest positive subnormal number > >>>>>> - neg of " " " " > >>>>>> - all of the above with negative numbers > >>>>>> - NaN > >>>>>> - abs of NaN > >>>>>> - neg of " > >>>>>> > >>>>>> Perhaps others? +/-Inf just for kicks? > >>>>> > >>>>> What's the point? The result of most of the above (except possibly > >>>>> bool(NaN)) is undefined by the spec: > >>>>> > >>>>> "Any denormalized value input into a shader or potentially generated by > >>>>> any operation in a shader can be flushed to 0. [...] NaNs are not > >>>>> required to be generated. [...] Operations and built-in functions that > >>>>> operate on a NaN are not required to return a NaN as the result." > >>>> > >>>> Except that apparently one major OpenGL vendor does something well > >>>> defined that's different than what we do. > >>> > >>> I'm skeptical that nVidia would treat single-precision denorms > >>> inconsistently between datatype constructors and other floating-point > >>> arithmetic, but assuming that's the case it would be an argument for > >>> proposing the spec change to Khronos rather than introducing a dubiously > >>> compliant change into the back-end. I think I would argue against > >>> making such a change in the spec in any case, because even though it's > >>> implementation-defined whether denorms are flushed or not, the following > >>> is guaranteed by the spec AFAIUI: > >>> > >>> | if (bool(f)) > >>> | random_arithmetic_on(f /* Guaranteed not to be zero here even if > >>> | denorms are flushed */); > >>> > >>> With this change in, bool(f) would evaluate to true even if f is a > >>> denorm which is flushed to zero for all subsequent arithmetic in the > >>> block. For that reason this seems more likely to break stuff than to > >>> fix stuff, it's not like applications can expect denorms to be > >>> representable at all regardless of what nVidia does, but they can expect > >>> the above GLSL source to work. > >> > >> I'd be curious to see a shader test that triggers the relevant > >> behaviour -- AFAIK nvidia always flushes denorms. The flush is a > >> per-instruction setting though [since Fermi], so it's technically > >> feasible not to flush sometimes and to flush other times. > > > > This one: > > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Igalia_piglit_blob_arb-5Fgpu-5Fshader-5Ffp64_tests_spec_arb-5Fgpu-5Fshader-5Ffp64_execution_fs-2Dconversion-2Dexplicit-2Ddouble-2Dbool-2Dbad.shader-5Ftest&d=BQICaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_I&m=K5Q85JZWVAJuljMHdoEb9NzJVH29wma5HWxq4K5UNs0&s=knFllQfBVoab6tHDLvgEdZUWb7WW77hBt62DQT6Co3s&e= > > > > > > That test is for doubles. The author of the test told me that the > > equivalent version for float also passed, but I have just verified that > > he made a mistake. In summary: that test passes for double and fails for > > float... > > > > The quote from the spec shared by Curro says: > > > > "Any denormalized value input into a shader or potentially generated by > > any operation in a shader can be flushed to 0" > > > > It says _can_ instead of _must_, so I share Curro's point of view that > > this is in fact undefined behavior by the spec and f2b of a denorm could > > return true or false and both are valid implementations. I wonder if > > nvidia's case of flushing floats but not doubles is also acceptable > > though, sounds like a bug to me. > > GL is always "can" for such things, and never "must" :-). > (d3d10 would require "must be flushed" for floats.) > Flushing denorms for floats but not for doubles looks perfectly fine to > me too, there's quite some hw which can't do denorms for floats (at > least not without compromises) but can do it for doubles (because > doubles had a low rate and were only useful for HPC, whereas floats need > to favor performance over everything else, essentially).
Ah, interesting, thanks for sharing this! In that case I wonder what we should do for doubles in the case of i965. I have patches for both implementations, so which one do we want? Iago _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev