Sorry, I somehow only now got to read more of your email. Here's my reply:
On Thursday, 14. July 2011 11:35:29 Matthias Bentrup wrote: > Also regarding rev 2052, I think that changing the default rounding mode > from round-to-nearest to round-to-zero should be avoided, as it is less > precise and round-to-nearest is in fact what all libraries expect. Actually, I always thought round-towards-zero behaviour is standard? When compiling a test program with gcc that casts float to int this is the behaviour you get. All I'm doing is make this consistent for all platforms by explicitly setting the cw. This is actually mirrored by id introducing the snapvector trap call for VMs. They originally used a simple float cast for snapvector (fld + fistp) and it would result in different behaviour on different operating systems/platforms. > Q_ftol is only used in places where precision is > not important (e.g. converting colors from float to byte), but the default > rounding mode affects every fp operation, so it could influence the game > physics etc. The game physics are calculated by the VMs, and these have special conversion functions. > This is especially bad as you have to switch the mxcsr back to > round-to-nearest for the snapvector function. Control register changes are > very expensive and should be avoided. This is what's been happening for 10 years in quake3 already, if you check the code before I applied those changes. > Speaking of snapvector, I also would avoid the maskmovdqu instruction as it > writes directly to memory, bypassing the cache. When you re-read the vector > later you will always encounter a cache-miss, so I think it would be better > to write the result back with moveups, preserving the original 4th value : > > qsnapvectorsse PROC > movaps xmm1, ssemask ; initialize the mask register > movups xmm0, [rcx] ; here is stored our vector. Read 4 values in one go > movaps xmm2, xmm0 ; keep a copy of the original data > andps xmm0, xmm1 ; set the fourth value to zero in xmm0 > andnps xmm1, xmm2 ; set values one, two and three to zero in xmm1 > cvtps2dq xmm0, xmm0 ; convert 4 single fp to int > cvtdq2ps xmm0, xmm0 ; convert 4 int to single fp > orps xmm0, xmm1 ; combine all 4 values again > movups [rcx], xmm0 ; write 3 rounded and 1 unchanged values back to memory > ret > qsnapvectorsse ENDP > > (assuming the global round mode is round-to-nearest ofc). Hmm, yeah. I guess that would work too. Did you do performance comparisons via things like profiling? -- Thilo Schulz
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ ioquake3 mailing list ioquake3@lists.ioquake.org http://lists.ioquake.org/listinfo.cgi/ioquake3-ioquake.org By sending this message I agree to love ioquake3 and libsdl.