On Fri, Feb 5, 2016 at 7:11 PM, Roland Scheidegger <srol...@vmware.com> wrote: > Am 05.02.2016 um 17:04 schrieb Marek Olšák: >> On Fri, Feb 5, 2016 at 4:48 PM, Roland Scheidegger <srol...@vmware.com> >> wrote: >>> Am 05.02.2016 um 16:08 schrieb Marek Olšák: >>>> On Fri, Feb 5, 2016 at 3:56 PM, Roland Scheidegger <srol...@vmware.com> >>>> wrote: >>>>> Am 05.02.2016 um 15:44 schrieb Marek Olšák: >>>>>> On Fri, Feb 5, 2016 at 10:57 AM, Marek Olšák <mar...@gmail.com> wrote: >>>>>>> On Fri, Feb 5, 2016 at 1:55 AM, Matt Turner <matts...@gmail.com> wrote: >>>>>>>> On Thu, Feb 4, 2016 at 10:50 AM, Marek Olšák <mar...@gmail.com> wrote: >>>>>>>>> From: Marek Olšák <marek.ol...@amd.com> >>>>>>>>> >>>>>>>>> This is a subset of the generated tests which are known to fail >>>>>>>>> on everything except CPU emulation (AFAIK). >>>>>>>>> --- >>>>>>>> >>>>>>>> This is really awful. Committing a generated test, but with unknown >>>>>>>> bits chopped out is gross. >>>>>>>> >>>>>>>> If it were me, I'd want to understand why my hardware behaved >>>>>>>> differently -- not just hack up *different* tests and claim victory. >>>>>>>> >>>>>>>> FWIW, the generated tests pass on all Intel hardware exposing >>>>>>>> ARB_shading_language_packing. Gen7+ has native half-float support, and >>>>>>>> Gen6 uses the lowering code in lower_packing_builtins.cpp to turn the >>>>>>>> built-ins into a pile of instructions. >>>>>>>> >>>>>>>> If you can identify how AMD hardware behaves differently and can prove >>>>>>>> that the generator needs to be relaxed or something, that's cool. But >>>>>>>> as is, I hate this patch. >>>>>>>> >>>>>>>> I can't find anything in the AMD docs (I looked at GCN3) about >>>>>>>> half-precision support, so I can't check my theory that AMD hardware >>>>>>>> rounds towards zero instead of to-nearest/even like Intel. >>>>>>> >>>>>>> Since the tests only fail with very small numbers, I think the problem >>>>>>> is that denorms are disabled by radeonsi. I can try to confirm that. >>>>>>> >>>>>>> The hardware rounds to nearest even. The hw precision is: >>>>>>> - unpack functions - 0 ULP >>>>>>> - pack functions = 0.5 ULP >>>>>>> - input and output denorms are flushed to 0 >>>>>> >>>>>> Hey Matt, >>>>>> >>>>>> I have just confirmed that I was right. After I enable denormals in >>>>>> hw, the original tests pass. This means that this patch tests the >>>>>> packing functions but skips denormals. >>>>>> >>>>>> Not so awful now, is it? :) >>>>>> >>>>>> Sadly, I can't enable denormals on all chips, because they are slow. >>>>>> >>>>>> So if I add "-no-denormals" suffix into the test names, I can push this, >>>>>> right? >>>>>> >>>>> >>>>> Can't you hack up the generator instead? By the looks of it >>>>> (gen_builtin_packing_tests.py) it has a list of values which result in >>>>> denorm f16 values (make_inputs_for_pack_half_2x16). Presumably you could >>>>> add a test there which uses a different list, not including them. >>>>> >>>>> (That said, I'm a bit surprised for conversion to/from fp16 your hw >>>>> doesn't do fp16 denorms - they'd be required by d3d(11) as well.) >>>> >>>> The hardware can do denormals, they are just slow on all chips except VI. >>>> >>>> GLSL 4.50 doesn't require denormals, thus piglit shouldn't even contain >>>> tests for them. >>>> >>> >>> I'm not asking about denormals for ordinary operations, just conversion >>> to fp16 (any fp16 denorm is a fp32 normal). That would be along similar >>> lines what d3d10 already required - forbids fp32 denorms, but for >>> instance sampling fp16 surfaces requires you to handle the fp16 denorms >>> without flushing to zero (that's at least what the docs say - maybe you >>> can get away without it...). FWIW x86 half-float conversion instructions >>> (vcvtph2ps, vcvtps2ph) work that way too - even if you have denorms >>> disabled, that instruction will still produce fp16 denorms (and convert >>> fp16 denorms to fp32 normals), you can configure rounding mode but >>> there's no way to disable fp16 denorms. >>> Albeit gcn3 supports fp16 natively (that is it has actual operations >>> using them not just conversion), so I suppose it makes sense it would >>> flush them on conversion too... >>> But I think you're right glsl shouldn't require them (albeit it is >>> pretty silent on that topic as far as I can tell - certainly doesn't >>> require them for fp32). >> >> There is no point in enabling denormals for the conversion >> instructions if all previous FP instructions flush denormals to 0 and >> so do the following FP instructions. > I think you didn't get the point: any fp16 denormal is a fp32 normal > (and similarly, any fp32 denormal is fp16 zero). And glsl (well the > non-es variants anyway) don't deal with anything but fp32 really, you > cannot manipulate these packed fp16 values any other way than unpacking > to fp32 (well you can of course, but not really in a "float way"). > Thus, even if you have no denormals for fp32, fp16 denorms are still > useful for packing, unpacking.
OK. I get your point now. Marek _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev