2011/7/25 Thilo Schulz <a...@ats.s.bawue.de> > Sorry, > > I somehow only now got to read more of your email. Here's my reply: > > On Thursday, 14. July 2011 11:35:29 Matthias Bentrup wrote: > > Also regarding rev 2052, I think that changing the default rounding mode > > from round-to-nearest to round-to-zero should be avoided, as it is less > > precise and round-to-nearest is in fact what all libraries expect. > > Actually, I always thought round-towards-zero behaviour is standard? When > compiling a test program with gcc that casts float to int this is the > behaviour you get. All I'm doing is make this consistent for all platforms > by > explicitly setting the cw. > This is actually mirrored by id introducing the snapvector trap call for > VMs. > They originally used a simple float cast for snapvector (fld + fistp) and > it > would result in different behaviour on different operating > systems/platforms. > > Yes, round to zero is the standard rounding mode for ftol and float-to-int conversion in C, but round-to-nearest is the standard rounding mode for every other floating point operation ( http://www.gnu.org/s/hello/manual/libc/Rounding.html).
Unfortunately the x87 has to change rounding modes to implement ftol with round-to-zero semantics and this was a *very* expensive operation on early pentiums. I can't find the link now, but I remember that Intel advised to avoid the default ftol implementation and use a fistp based round-to-nearest assembly function instead. > > Q_ftol is only used in places where precision is > > not important (e.g. converting colors from float to byte), but the > default > > rounding mode affects every fp operation, so it could influence the game > > physics etc. > > The game physics are calculated by the VMs, and these have special > conversion > functions. > The rounding mode affects all floating point operations, not just float to int conversion. And round-to-zero has a higher relative error than round to nearest. > > > This is especially bad as you have to switch the mxcsr back to > > round-to-nearest for the snapvector function. Control register changes > are > > very expensive and should be avoided. > > This is what's been happening for 10 years in quake3 already, if you check > the > code before I applied those changes. > Oh. The original SnapVector assembly loaded the control word to set round-to-nearest, but that should have been the default mode anyway. I somehow remembered that SnapVector was using the default rounding mode, but I didn't think that it explicitly loaded the control word. But if we have round-to-nearest by default we can skip the control word load. > > Speaking of snapvector, I also would avoid the maskmovdqu instruction as > it > > writes directly to memory, bypassing the cache. When you re-read the > vector > > later you will always encounter a cache-miss, so I think it would be > better > > to write the result back with moveups, preserving the original 4th value > : > > > > qsnapvectorsse PROC > > movaps xmm1, ssemask ; initialize the mask register > > movups xmm0, [rcx] ; here is stored our vector. Read 4 values in one go > > movaps xmm2, xmm0 ; keep a copy of the original data > > andps xmm0, xmm1 ; set the fourth value to zero in xmm0 > > andnps xmm1, xmm2 ; set values one, two and three to zero in xmm1 > > cvtps2dq xmm0, xmm0 ; convert 4 single fp to int > > cvtdq2ps xmm0, xmm0 ; convert 4 int to single fp > > orps xmm0, xmm1 ; combine all 4 values again > > movups [rcx], xmm0 ; write 3 rounded and 1 unchanged values back to > memory > > ret > > qsnapvectorsse ENDP > > > > (assuming the global round mode is round-to-nearest ofc). > > Hmm, yeah. I guess that would work too. Did you do performance comparisons > via > things like profiling? > > I have run both versions of snapvector 10 million times in a loop and measured them with CodeAnalyst: The maskmovdqu version: CPU Clocks 278678, IPC 0.06, DC miss rate 0,02 The andps/orps version: CPU Clocks 61028, IPC 0.36, DC miss rate 0 (Both versions without any control-word loads/stores). Overall I don't think that this makes a noticeable difference either for the performance nor the precision of the game. But when it doesn't make a difference I'd always stick to the standard settings. If SSE is available, we can use the cvtt* opcodes for fast truncating conversions and still keep the standard round-to-nearest mode for everything else. If we have no SSE we can chose the "correct" or "fast" float to int conversion, but I'd prefer to keep the default rounding mode for all the other operations. -- > Thilo Schulz > > _______________________________________________ > ioquake3 mailing list > ioquake3@lists.ioquake.org > http://lists.ioquake.org/listinfo.cgi/ioquake3-ioquake.org > By sending this message I agree to love ioquake3 and libsdl. >
_______________________________________________ ioquake3 mailing list ioquake3@lists.ioquake.org http://lists.ioquake.org/listinfo.cgi/ioquake3-ioquake.org By sending this message I agree to love ioquake3 and libsdl.