glsl450: rewrite atan2 to deal with infinities

Francisco Jerez Tue, 24 Jan 2017 01:23:09 -0800

Francisco Jerez <curroje...@riseup.net> writes:

> "Juan A. Suarez Romero" <jasua...@igalia.com> writes:
>
>> On Sun, 2017-01-22 at 00:20 -0800, Francisco Jerez wrote:
>>> "Juan A. Suarez Romero" <jasua...@igalia.com> writes:
>>> 
>>> > Rewrite atan2(y,x) to cover (+/-)INF values.
>>> > 
>>> > This fixes several test cases in Vulkan CTS
>>> > (dEQP-VK.glsl.builtin.precision.atan2.*)
>>> > 
>>> > v2: do not flush denorms to 0 (jasuarez)
>>> > ---
>>> >  src/compiler/spirv/vtn_glsl450.c | 48 
>>> > +++++++++++++++++++++++++++++++++++-----
>>> >  1 file changed, 42 insertions(+), 6 deletions(-)
>>> > 
>>> > diff --git a/src/compiler/spirv/vtn_glsl450.c 
>>> > b/src/compiler/spirv/vtn_glsl450.c
>>> > index 0d32fddbef..d52a22c0c3 100644
>>> > --- a/src/compiler/spirv/vtn_glsl450.c
>>> > +++ b/src/compiler/spirv/vtn_glsl450.c
>>> > @@ -299,18 +299,47 @@ build_atan(nir_builder *b, nir_ssa_def *y_over_x)
>>> >     return nir_fmul(b, tmp, nir_fsign(b, y_over_x));
>>> >  }
>>> >  
>>> > +/*
>>> > + * Computes atan2(y,x)
>>> > + */
>>> >  static nir_ssa_def *
>>> >  build_atan2(nir_builder *b, nir_ssa_def *y, nir_ssa_def *x)
>>> >  {
>>> >     nir_ssa_def *zero = nir_imm_float(b, 0.0f);
>>> > -
>>> > -   /* If |x| >= 1.0e-8 * |y|: */
>>> > -   nir_ssa_def *condition =
>>> > -      nir_fge(b, nir_fabs(b, x),
>>> > -              nir_fmul(b, nir_imm_float(b, 1.0e-8f), nir_fabs(b, y)));
>>> > +   nir_ssa_def *inf = nir_imm_float(b, INFINITY);
>>> > +   nir_ssa_def *minus_inf = nir_imm_float(b, -INFINITY);
>>> > +   nir_ssa_def *m_3_pi_4 = nir_fmul(b, nir_imm_float(b, 3.0f),
>>> > +                                       nir_imm_float(b, M_PI_4f));
>>> > +
>>> > +   /* if y == +-INF */
>>> > +   nir_ssa_def *y_is_inf = nir_feq(b, nir_fabs(b, y), inf);
>>> > +
>>> > +   /* if x == +-INF */
>>> > +   nir_ssa_def *x_is_inf = nir_feq(b, nir_fabs(b, x), inf);
>>> > +
>>> > +   /* Case: y is +-INF */
>>> > +   nir_ssa_def *y_is_inf_then =
>>> > +      nir_fmul(b, nir_fsign(b, y),
>>> > +                  nir_bcsel(b, nir_feq(b, x, inf),
>>> > +                               nir_imm_float(b, M_PI_4f),
>>> > +                               nir_bcsel(b, nir_feq(b, x, minus_inf),
>>> > +                                            m_3_pi_4,
>>> > +                                            nir_imm_float(b, M_PI_2f))));
>>> > +
>>> > +   /* Case: x is +-INF */
>>> > +   nir_ssa_def *x_is_inf_then =
>>> > +      nir_fmul(b, nir_fsign(b, y),
>>> > +                  nir_bcsel(b, nir_feq(b, x, inf),
>>> > +                               zero,
>>> > +                               nir_imm_float(b, M_PIf)));
>>> > +
>>> 
>>> I don't think we need all these special cases.  The majority of the
>>> infinity/zero handling rules required by IEEE are fairly natural and
>>> would be taken care of without any additional effort by the
>>> floating-point division operation and single-argument atan function
>>> below if they propagated infinities and zeroes according to IEEE rules.
>>> 
>>> I had a look at the test results myself and noticed that the failures
>>> are for the most part due to a precision problem in the current
>>> implementation that doesn't only affect infinity -- Relative precision
>>> also explodes as x grows above certain point, infinities just make the
>>> problem catastrophic and cause it to return NaN instead of the
>>> expected finite value.  The reason for the precision problem is that
>>> fdiv is later on lowered into an fmul+frcp sequence, and the latter may
>>> flush the result to zero if the denominator was so huge that its
>>> reciprocal would be denormalized.  If the numerator happened to be
>>> infinite you may end up with ∞/huge = NaN for the same reason.
>>> 
>>
>> Right. For this case I'd submitted a patch to the test itself, that
>> roughly speaking assumes any result as possible if denominator is big
>> enough.
>>
>> https://gerrit.khronos.org/#/c/524/
>>
>>
>> I understand with your alternative proposal you would also handle this
>> case correctly, making the CTS change not required, right?
>>
> Yes, I think the CTS had found a legitimate bug in our atan2
> implementation, patching it only conceals the problem -- Granted that
> trigonometric functions have unspecified precision according to the GLSL
> spec [so you could argue that the majority of these tests shouldn't even
> exist in the first place ;)], but the result was over 8 million ULP off
> for a range of inputs which seems a bit over the top.  With my atan2
> implementation the related CTS tests pass without any changes.
>


Hey! I didn't send the patches yet because further testing uncovered a
number of unexpected issues -- It broke i915 and other
non-integer-capable drivers due to their lack of a csel instruction
(that could be fixed easily) and their lack of any reasonable mechanism
I could use to take the sign bit of a (potentially zero) floating point
value (fixing this involved fully rewriting the atan2 implementation).
On top of that the new approximation seemed to uncover a back-end bug I
haven't addressed yet, but it seemed to be worked around by the other
patch that implements IEEE-compliant handling of atan2(±∞, ±∞) by pure
luck, so feel free to take a look at my jenkins branch if you're
curious:

https://cgit.freedesktop.org/~currojerez/mesa/log/?h=jenkins

Funnily enough, with the new atan2 implementation the infinity-handling
patches (which are required for correctness of other finite arguments
due to the back-end bug) are no longer required for correct handling of
(±∞, ±∞) (IOW we can implement the whole infinity corner cases with zero
instructions), but the reason why that's the case is somewhat obscure:
It relies on the fact that the i965 implementation of the fmin and fmax
instructions emitted by build_atan() (or do_atan() in the GLSL builtin
generator) doesn't behave according to the GLSL spec when one of the
arguments is a NaN, instead they always return the non-NaN value as the
IEEE 754 spec recommends, but that's is likely by accident and I don't
think we want to rely on it in production...

>>
>>> On top of that there seem to be other issues with the current atan2
>>> implementation:
>>> 
>>>  - It doesn't handle zeros correctly.  This may be related to your
>>>    observation that denorm arguments cause it to give bogus results, but
>>>    the problem doesn't seem to be related to denorms in particular, but
>>>    to the fact that denorms can get flushed to -0 which is in turn
>>>    handled incorrectly.  The reason is that the existing code uses 'y >=
>>>    0' to determine on which side of the branch cut we are, but that
>>>    causes the discontinuity to end up along the y=-epsilon line instead
>>>    of along the y=0 line as IEEE requires -- IOW, with the current
>>>    implementation very small negative y values behave as if they were
>>>    positive which causes the result to have a large absolute error of
>>>    2π.
>>> 
>>>  - It doesn't give IEEE-compliant results when both arguments are
>>>    simultaneously infinite.  This is not surprising given that IEEE
>>>    defining atan2(∞, ∞) = π/4 is fairly artificial (as are the other
>>>    rules for combinations of positive or negative infinity), strictly
>>>    speaking taking the limit along any direction other than the diagonal
>>>    would be as right or wrong.  To make the matter worse IEEE disagrees
>>>    with itself on the direction limits are taken when it goes on and
>>>    defines e.g. atan2(+0, -0) = π taken along the horizontal.  Luckily
>>>    GLSL specifically allows implementations to deviate from IEEE rules
>>>    in a neighborhood of zero ("Results are undefined if x and y are both
>>>    0."), so we don't need to care about that one.
>>> 
>>> Except for the last point, these issues seem serious enough to be worth
>>> fixing -- I'll reply to this thread with an alternative implementation
>>> of atan2 that addresses the first two issues (and actually uses less
>>> instructions than the current implementation), plus another, optional
>>> patch that addresses the third issue in order to make the CTS tests
>>> happy (we can implement the whole oddball atan2(±∞, ±∞) corner cases as
>>> IEEE says with only three additional Gen instructions by being bit
>>> smart, but one could argue that Khronos should just make atan2(±∞, ±∞)
>>> undefined since they're already deviating from IEEE-compliant behavior
>>> at (±0, ±0)).
>>> 
>>> P.S.: Been waiting for hours to get jenkins results and it looks like
>>>       the CI is busted, will send the patches once I get positive
>>>       results -- Likely not today ;).
>>> 
>>> > +   /* If x > 0 */
>>> > +   nir_ssa_def *x_not_zero =
>>> > +      nir_fne(b, x, zero);
>>> >  
>>> >     /* Then...call atan(y/x) and fix it up: */
>>> >     nir_ssa_def *atan1 = build_atan(b, nir_fdiv(b, y, x));
>>> > +
>>> >     nir_ssa_def *r_then =
>>> >        nir_bcsel(b, nir_flt(b, x, zero),
>>> >                     nir_fadd(b, atan1,
>>> > @@ -323,7 +352,14 @@ build_atan2(nir_builder *b, nir_ssa_def *y, 
>>> > nir_ssa_def *x)
>>> >     nir_ssa_def *r_else =
>>> >        nir_fmul(b, nir_fsign(b, y), nir_imm_float(b, M_PI_2f));
>>> >  
>>> > -   return nir_bcsel(b, condition, r_then, r_else);
>>> > +   /* Everything together */
>>> > +   return nir_bcsel(b, y_is_inf,
>>> > +                       y_is_inf_then,
>>> > +                       nir_bcsel(b, x_is_inf,
>>> > +                                    x_is_inf_then,
>>> > +                                    nir_bcsel(b, x_not_zero,
>>> > +                                                 r_then,
>>> > +                                                 r_else)));
>>> >  }
>>> >  
>>> >  static nir_ssa_def *
>>> > -- 
>>> > 2.11.0
>>> > 
>>> > _______________________________________________
>>> > mesa-dev mailing list
>>> > mesa-dev@lists.freedesktop.org
>>> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev

signature.asc
Description: PGP signature

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2] nir/spirv/glsl450: rewrite atan2 to deal with infinities

Reply via email to