On Wed, Jan 13, 2016 at 2:01 AM, Ian Romanick <i...@freedesktop.org> wrote:
> On 01/12/2016 05:41 PM, Matt Turner wrote: > > On Tue, Jan 12, 2016 at 4:10 PM, Jason Ekstrand <ja...@jlekstrand.net> > wrote: > >> On Tue, Jan 12, 2016 at 3:52 PM, Matt Turner <matts...@gmail.com> > wrote: > >>> > >>> On Tue, Jan 12, 2016 at 3:35 PM, Jason Ekstrand <ja...@jlekstrand.net> > >>> wrote: > >>>> This opcode simply takes a 32-bit floating-point value and reduces its > >>>> effective precision to 16 bits. > >>>> --- > >>> > >>> What's it supposed to do for values not representable in > half-precision? > >> > >> > >> If they're in-range, round. If they're out-of-range, the appropriate > >> infinity. > > > > Are you sure that's the behavior hardware has? And by "are you sure" I > > mean "have you tested it" > > > > The conversion table in the f32to16 documentation in the IVB PRM says: > > > > single precision -> half precision > > ------------------------------------ > > -finite -> -finite/-denorm/-0 > > +finite -> +finite/+denorm/+0 > > > >> > https://www.khronos.org/registry/spir-v/specs/1.0/SPIRV.html#OpQuantizeToF16 > > > >> Quantize a floating-point value to a what is expressible by a 16-bit > floating-point value. > > > > Erf, anyway, > > > > ... and the "convert too-large values to inf" isn't the behavior of > > other languages like C [1] (and I don't think GLSL either, but I can't > > find anything on the matter i the spec) or OpenCL C [2]. > > Some background may either clarify or further muddy things. > > Right now applications sprinkle mediump and lowp all over the place in > GLSL ES shaders. Many vertex shader implementations, even on mobile > devices, do everything in single precision. Many devices will only use > f16 part of the time because some instructions may not have f16 > versions. When we finally implement f16 in the i965 driver, we'll be in > this boat too. > > As a result, people think that their mediump-decorated code is fine... > until it actually runs on a device that really does mediump. Then they > report a bug to the vendor of that hardware. Sound like a familiar > situation? > > From this problem the OpQuantizeToF16 SPRI-V instruction was born. The > intention is that people could compile their code in a way that mediump > gives you mediump precision on every device. While you probably > wouldn't want to ship such code, this at least makes it possible to test > it without having to find a device that will really do native mediump > calculations all the time. > > IIRC, GLSL doesn't require Inf in mediump. I don't recall what SPRI-V > says. I believe that GLSL allows saturating to the maximum magnitude > representable value. What we want is for an expression tree like > > OpQuantizeToF16(OpQuantizeToF16(x) + OpQuantizeToF16(y)) > > to produce the same value that 'x + y' would produce in "real" f16 mediump. > Right. This is exactly why the opcode was created. > > The SPRI-V +/-Inf requirement doesn't completely jive with my > recollection of the discussions... but there was a lot of > back-and-forth, and it was quite a few months ago at this point. I > think we may have picked just one possible answer instead of allowing > both choices just for consistency. I don't have any memory whether > anyone strongly wanted the +/-Inf behavior or if it was just a coin toss. > For OpQuantizeF16, the spec does currently > > > Section 8.3.2 of the OpenCL C 2.0 spec is also relevant, but doesn't > > touch directly on the issue at hand. > > > > I'm worried that what is specified is not implementable via a round > > trip through half-precision, because it's not the behavior other > > languages implement. > > > > If I had to guess, given the table in the IVB PRM and section 8.3.2, > > out-of-range single-precision floats are converted to the > > half-precision value with the largest magnitude. > > You are correct, we should test it to be sure what the hardware really > does. This is not intended to be a performance operation. If we need to > use a different, more expensive expansion to meet the requirements, we > shouldn't lose any sleep over it. > I haven't looked at it in bit-for-bit detail, but I I did run it through a set of tests which explicitly hits denorms and the out-of-bounds cases in both directions. The tests seem to indicate that the hardware does what the opcode claims. --Jason
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev