CC mesa-dev. This looks good to me. I am starting to wonder though why we don't just use lrintf() and let the compiler sort it out (for x86 too). Though actually some quick experiments show that: - llvm's clang will always use libm lrintf call. Which then will do (x86_64) cvtss2si %xmm0,%rax as expected. Meaning the cost is probably twice as high as it could be due to the unnecessary library call. - gcc will also use the same library call. Unless you specify -fno-math-errno (or some more aggressive math optimizing stuff), in which case it will do the cvtss2si on its own. Which is fairly stupid, because this function doesn't set errno in any case, so it could be used independent of -fno-math-errno.
Speaking of -fno-math-errno, why don't we use that in mesa? I know the fast math stuff can be problematic, but noone is _ever_ interested in math error numbers. Speaking of which, I'm not really sure why IROUND isn't doing the same. Yes it rounds away from zero, but I doubt that matters - would probably be better to match whatever rounding is used in hw (GL doesn't seem to specify tie-breaker rules for round to nearest afaict). FWIW IROUND along with even the 64bit sibling IROUND64 (and IROUND_POS) is not even really correct in any case. There exist floats where f + 0.5f will round up to the next integer incorrectly. e.g. something like "largest float smaller than 63.5f", 63.4999999f or so, if you add +0.5f the resulting number for the hw is right between that largest float smaller than 63.5f and 64.0f, and thus it will use the tie-breaker rule (round to nearest even for your typical hw with typical rounding mode set) making this 64.0, thus the rounded integer will be 64, which is just plain wrong no matter the round-to-nearest tie breaker rule. There are ways to fix it (the obvious one is to add 0.5 as double), but I don't think we should even try that, and assume lrintf can do a decent job on hw we care about (compiler not doing its job right is a pity but might not be too bad even if it uses lib call). Roland Am 31.07.2015 um 11:39 schrieb Jochen Rollwagen: > Hi, > > i've produced and tested the following mesa patch for powerpc platforms > (based on/inspired by commit 989d2e370993c87d1bbda4950657bfcc5b0a58dd > <https://urldefense.proofpoint.com/v2/url?u=http-3A__cgit.freedesktop.org_mesa_mesa_commit_-3Fid-3D989d2e370993c87d1bbda4950657bfcc5b0a58dd&d=BQMDaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_I&m=STUSsCU8K3Bmojesg2aDadO4Yvj0pB5w9ggZb8QtCMA&s=PHNgz2OSHkM8m7A0H0Nf4Y4E917JN4HzwMtSd5qCHFE&e=> > "Add an accelerated version of F_TO_I for x86_64"): > > diff --git a/src/mesa/main/imports.h b/src/mesa/main/imports.h > index 09e55eb..e4feb83 100644 > --- a/src/mesa/main/imports.h > +++ b/src/mesa/main/imports.h > @@ -296,6 +296,14 @@ static inline int F_TO_I(float f) > return r; > #elif defined(__x86_64__) > return _mm_cvt_ss2si(_mm_load_ss(&f)); > +#elif defined(__GNUC__) && defined(__PPC__) > + long res [2] ; > + > + __asm__( "fctiw %0,%0\n\t" > + "stfd %0,%1\n\t" > + : "=f" (f), "=o" (res): ); > + > + return res [1] ; > #else > return IROUND(f); > #endif > > > any chance to get this into mesa for the few other powerpc hangouts > still around ? performance is markedly improved (although i didn't > really measure it :-) ) > > _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev