Am 13.04.2016 um 12:12 schrieb Jose Fonseca: > On 13/04/16 06:51, Jose Fonseca wrote: >> On 13/04/16 04:00, srol...@vmware.com wrote: >>> From: Roland Scheidegger <srol...@vmware.com> >>> >>> We used to use sse roundps intrinsic directly, but switched to use the >>> llvm >>> intrinsics for rounding with e4f01da15d8c6ce3e8c77ff3ff3d2ce2574a3f7b. >>> However, llvm semantics follows standard math lib round function >>> which is >>> specced to do roundNearestAwayFromZero but we really want >>> roundNearestEven >>> (moreoever, using round generates atrocious code since the cpu can't >>> do it >>> directly and it results in scalar calls to libm __roundf). >>> So, use llvm.nearbyint instead, which does exactly the right thing, >>> and even >>> has the advantage of being available with llvm 3.3 too. (I've >>> verified it >>> actually generates a roundps instruction with llvm 3.3.) >>> >>> This fixes https://bugs.freedesktop.org/show_bug.cgi?id=94909 >> >> Thanks Roland. >> >> I should've cought that. I think I forget to actually look at >> llvm.round's assembly as I only found that intrisic much later. I >> presumed it was roundps. >> >>> --- >>> src/gallium/auxiliary/gallivm/lp_bld_arit.c | 6 +----- >>> 1 file changed, 1 insertion(+), 5 deletions(-) >>> >>> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_arit.c >>> b/src/gallium/auxiliary/gallivm/lp_bld_arit.c >>> index 0c43617..74f1683 100644 >>> --- a/src/gallium/auxiliary/gallivm/lp_bld_arit.c >>> +++ b/src/gallium/auxiliary/gallivm/lp_bld_arit.c >>> @@ -1863,11 +1863,7 @@ lp_build_round_arch(struct lp_build_context *bld, >>> >>> switch (mode) { >>> case LP_BUILD_ROUND_NEAREST: >>> - if (HAVE_LLVM >= 0x0304) { >>> - intrinsic_root = "llvm.round"; >>> - } else { >>> - return lp_build_nearest_sse41(bld, a); >> >> You can remove lp_build_nearest_sse41 too while you're at it. >> >>> - } >>> + intrinsic_root = "llvm.nearbyint"; >>> break; >>> case LP_BUILD_ROUND_FLOOR: >>> intrinsic_root = "llvm.floor"; >>> >> >> We should add a test case to lp_test_arit too. Like this: >> >> diff --git a/src/gallium/drivers/llvmpipe/lp_test_arit.c >> b/src/gallium/drivers/llvmpipe/lp_test_arit.c >> index a0f2db7..ba831f3 100644 >> --- a/src/gallium/drivers/llvmpipe/lp_test_arit.c >> +++ b/src/gallium/drivers/llvmpipe/lp_test_arit.c >> @@ -218,6 +218,7 @@ const float round_values[] = { >> -10.0, -1, 0.0, 12.0, >> -1.49, -0.25, 1.25, 2.51, >> -0.99, -0.01, 0.01, 0.99, >> + -1.5, -0.5, 0.5, 1.5, >> 1.401298464324817e-45f, // smallest denormal >> -1.401298464324817e-45f, >> 1.62981451e-08f, >> @@ -283,7 +284,7 @@ unary_tests[] = { >> {"sin", &lp_build_sin, &sinf, sincos_values, >> Elements(sincos_values), 20.0 }, >> {"cos", &lp_build_cos, &cosf, sincos_values, >> Elements(sincos_values), 20.0 }, >> {"sgn", &lp_build_sgn, &sgnf, exp2_values, Elements(exp2_values), >> 20.0 }, >> - {"round", &lp_build_round, &roundf, round_values, >> Elements(round_values), 24.0 }, >> + {"round", &lp_build_round, &nearbyintf, round_values, >> Elements(round_values), 24.0 }, >> {"trunc", &lp_build_trunc, &truncf, round_values, >> Elements(round_values), 24.0 }, >> {"floor", &lp_build_floor, &floorf, round_values, >> Elements(round_values), 24.0 }, >> {"ceil", &lp_build_ceil, &ceilf, round_values, >> Elements(round_values), 24.0 }, >> >> >> It seems nearbyintf is available on MSVC too. >> >> It's just not clear whether MSVC implementation follows round-to-even: >> the nearbyint doc page is missing from MSDN, and rintf page explicitly >> states that rint(2.5) is 3 per, >> https://msdn.microsoft.com/en-us/library/dn465165.aspx > > I've verified this works -- > https://ci.appveyor.com/project/jrfonseca/mesa/build/20 --, so I went > ahead and commited your change, plus the lp_test_arit one. >
Ok. That msdn page is funny ;-). It is true that things are (as stated there as well) according to current rounding mode, but that's pretty much always round-to-nearest-even (current rounding mode is of course switchable, but I think most architectures can't do anything else than nearest-even for their nearest mode). (It is actually not really obvious that tgsi "round" needs to be round-to-nearest-even - glsl has both round and roundEven instructions, for round other tie-breakers are allowed too. But we just translate it all the same since there doesn't seem to be any need for another roundNearest instruction since everybody can do round-to-nearest-even. d3d10 OTOH only has round_ne.) Roland _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev