Francisco Jerez <curroje...@riseup.net> writes: > "Juan A. Suarez Romero" <jasua...@igalia.com> writes: > >> On Sun, 2017-01-22 at 00:20 -0800, Francisco Jerez wrote: >>> "Juan A. Suarez Romero" <jasua...@igalia.com> writes: >>> >>> > Rewrite atan2(y,x) to cover (+/-)INF values. >>> > >>> > This fixes several test cases in Vulkan CTS >>> > (dEQP-VK.glsl.builtin.precision.atan2.*) >>> > >>> > v2: do not flush denorms to 0 (jasuarez) >>> > --- >>> > src/compiler/spirv/vtn_glsl450.c | 48 >>> > +++++++++++++++++++++++++++++++++++----- >>> > 1 file changed, 42 insertions(+), 6 deletions(-) >>> > >>> > diff --git a/src/compiler/spirv/vtn_glsl450.c >>> > b/src/compiler/spirv/vtn_glsl450.c >>> > index 0d32fddbef..d52a22c0c3 100644 >>> > --- a/src/compiler/spirv/vtn_glsl450.c >>> > +++ b/src/compiler/spirv/vtn_glsl450.c >>> > @@ -299,18 +299,47 @@ build_atan(nir_builder *b, nir_ssa_def *y_over_x) >>> > return nir_fmul(b, tmp, nir_fsign(b, y_over_x)); >>> > } >>> > >>> > +/* >>> > + * Computes atan2(y,x) >>> > + */ >>> > static nir_ssa_def * >>> > build_atan2(nir_builder *b, nir_ssa_def *y, nir_ssa_def *x) >>> > { >>> > nir_ssa_def *zero = nir_imm_float(b, 0.0f); >>> > - >>> > - /* If |x| >= 1.0e-8 * |y|: */ >>> > - nir_ssa_def *condition = >>> > - nir_fge(b, nir_fabs(b, x), >>> > - nir_fmul(b, nir_imm_float(b, 1.0e-8f), nir_fabs(b, y))); >>> > + nir_ssa_def *inf = nir_imm_float(b, INFINITY); >>> > + nir_ssa_def *minus_inf = nir_imm_float(b, -INFINITY); >>> > + nir_ssa_def *m_3_pi_4 = nir_fmul(b, nir_imm_float(b, 3.0f), >>> > + nir_imm_float(b, M_PI_4f)); >>> > + >>> > + /* if y == +-INF */ >>> > + nir_ssa_def *y_is_inf = nir_feq(b, nir_fabs(b, y), inf); >>> > + >>> > + /* if x == +-INF */ >>> > + nir_ssa_def *x_is_inf = nir_feq(b, nir_fabs(b, x), inf); >>> > + >>> > + /* Case: y is +-INF */ >>> > + nir_ssa_def *y_is_inf_then = >>> > + nir_fmul(b, nir_fsign(b, y), >>> > + nir_bcsel(b, nir_feq(b, x, inf), >>> > + nir_imm_float(b, M_PI_4f), >>> > + nir_bcsel(b, nir_feq(b, x, minus_inf), >>> > + m_3_pi_4, >>> > + nir_imm_float(b, M_PI_2f)))); >>> > + >>> > + /* Case: x is +-INF */ >>> > + nir_ssa_def *x_is_inf_then = >>> > + nir_fmul(b, nir_fsign(b, y), >>> > + nir_bcsel(b, nir_feq(b, x, inf), >>> > + zero, >>> > + nir_imm_float(b, M_PIf))); >>> > + >>> >>> I don't think we need all these special cases. The majority of the >>> infinity/zero handling rules required by IEEE are fairly natural and >>> would be taken care of without any additional effort by the >>> floating-point division operation and single-argument atan function >>> below if they propagated infinities and zeroes according to IEEE rules. >>> >>> I had a look at the test results myself and noticed that the failures >>> are for the most part due to a precision problem in the current >>> implementation that doesn't only affect infinity -- Relative precision >>> also explodes as x grows above certain point, infinities just make the >>> problem catastrophic and cause it to return NaN instead of the >>> expected finite value. The reason for the precision problem is that >>> fdiv is later on lowered into an fmul+frcp sequence, and the latter may >>> flush the result to zero if the denominator was so huge that its >>> reciprocal would be denormalized. If the numerator happened to be >>> infinite you may end up with ∞/huge = NaN for the same reason. >>> >> >> Right. For this case I'd submitted a patch to the test itself, that >> roughly speaking assumes any result as possible if denominator is big >> enough. >> >> https://gerrit.khronos.org/#/c/524/ >> >> >> I understand with your alternative proposal you would also handle this >> case correctly, making the CTS change not required, right? >> > Yes, I think the CTS had found a legitimate bug in our atan2 > implementation, patching it only conceals the problem -- Granted that > trigonometric functions have unspecified precision according to the GLSL > spec [so you could argue that the majority of these tests shouldn't even > exist in the first place ;)], but the result was over 8 million ULP off > for a range of inputs which seems a bit over the top. With my atan2 > implementation the related CTS tests pass without any changes. >
Hey! I didn't send the patches yet because further testing uncovered a number of unexpected issues -- It broke i915 and other non-integer-capable drivers due to their lack of a csel instruction (that could be fixed easily) and their lack of any reasonable mechanism I could use to take the sign bit of a (potentially zero) floating point value (fixing this involved fully rewriting the atan2 implementation). On top of that the new approximation seemed to uncover a back-end bug I haven't addressed yet, but it seemed to be worked around by the other patch that implements IEEE-compliant handling of atan2(±∞, ±∞) by pure luck, so feel free to take a look at my jenkins branch if you're curious: https://cgit.freedesktop.org/~currojerez/mesa/log/?h=jenkins Funnily enough, with the new atan2 implementation the infinity-handling patches (which are required for correctness of other finite arguments due to the back-end bug) are no longer required for correct handling of (±∞, ±∞) (IOW we can implement the whole infinity corner cases with zero instructions), but the reason why that's the case is somewhat obscure: It relies on the fact that the i965 implementation of the fmin and fmax instructions emitted by build_atan() (or do_atan() in the GLSL builtin generator) doesn't behave according to the GLSL spec when one of the arguments is a NaN, instead they always return the non-NaN value as the IEEE 754 spec recommends, but that's is likely by accident and I don't think we want to rely on it in production... >> >>> On top of that there seem to be other issues with the current atan2 >>> implementation: >>> >>> - It doesn't handle zeros correctly. This may be related to your >>> observation that denorm arguments cause it to give bogus results, but >>> the problem doesn't seem to be related to denorms in particular, but >>> to the fact that denorms can get flushed to -0 which is in turn >>> handled incorrectly. The reason is that the existing code uses 'y >= >>> 0' to determine on which side of the branch cut we are, but that >>> causes the discontinuity to end up along the y=-epsilon line instead >>> of along the y=0 line as IEEE requires -- IOW, with the current >>> implementation very small negative y values behave as if they were >>> positive which causes the result to have a large absolute error of >>> 2π. >>> >>> - It doesn't give IEEE-compliant results when both arguments are >>> simultaneously infinite. This is not surprising given that IEEE >>> defining atan2(∞, ∞) = π/4 is fairly artificial (as are the other >>> rules for combinations of positive or negative infinity), strictly >>> speaking taking the limit along any direction other than the diagonal >>> would be as right or wrong. To make the matter worse IEEE disagrees >>> with itself on the direction limits are taken when it goes on and >>> defines e.g. atan2(+0, -0) = π taken along the horizontal. Luckily >>> GLSL specifically allows implementations to deviate from IEEE rules >>> in a neighborhood of zero ("Results are undefined if x and y are both >>> 0."), so we don't need to care about that one. >>> >>> Except for the last point, these issues seem serious enough to be worth >>> fixing -- I'll reply to this thread with an alternative implementation >>> of atan2 that addresses the first two issues (and actually uses less >>> instructions than the current implementation), plus another, optional >>> patch that addresses the third issue in order to make the CTS tests >>> happy (we can implement the whole oddball atan2(±∞, ±∞) corner cases as >>> IEEE says with only three additional Gen instructions by being bit >>> smart, but one could argue that Khronos should just make atan2(±∞, ±∞) >>> undefined since they're already deviating from IEEE-compliant behavior >>> at (±0, ±0)). >>> >>> P.S.: Been waiting for hours to get jenkins results and it looks like >>> the CI is busted, will send the patches once I get positive >>> results -- Likely not today ;). >>> >>> > + /* If x > 0 */ >>> > + nir_ssa_def *x_not_zero = >>> > + nir_fne(b, x, zero); >>> > >>> > /* Then...call atan(y/x) and fix it up: */ >>> > nir_ssa_def *atan1 = build_atan(b, nir_fdiv(b, y, x)); >>> > + >>> > nir_ssa_def *r_then = >>> > nir_bcsel(b, nir_flt(b, x, zero), >>> > nir_fadd(b, atan1, >>> > @@ -323,7 +352,14 @@ build_atan2(nir_builder *b, nir_ssa_def *y, >>> > nir_ssa_def *x) >>> > nir_ssa_def *r_else = >>> > nir_fmul(b, nir_fsign(b, y), nir_imm_float(b, M_PI_2f)); >>> > >>> > - return nir_bcsel(b, condition, r_then, r_else); >>> > + /* Everything together */ >>> > + return nir_bcsel(b, y_is_inf, >>> > + y_is_inf_then, >>> > + nir_bcsel(b, x_is_inf, >>> > + x_is_inf_then, >>> > + nir_bcsel(b, x_not_zero, >>> > + r_then, >>> > + r_else))); >>> > } >>> > >>> > static nir_ssa_def * >>> > -- >>> > 2.11.0 >>> > >>> > _______________________________________________ >>> > mesa-dev mailing list >>> > mesa-dev@lists.freedesktop.org >>> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
signature.asc
Description: PGP signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev