Re: [ioquake3] Dual protocol support and a question

Matthias Bentrup Tue, 26 Jul 2011 10:46:28 -0700

2011/7/25 Thilo Schulz <a...@ats.s.bawue.de>

> Sorry,
>
> I somehow only now got to read more of your email. Here's my reply:
>
> On Thursday, 14. July 2011 11:35:29 Matthias Bentrup wrote:
> > Also regarding rev 2052, I think that changing the default rounding mode
> > from round-to-nearest to round-to-zero should be avoided, as it is less
> > precise and round-to-nearest is in fact what all libraries expect.
>
> Actually, I always thought round-towards-zero behaviour is standard? When
> compiling a test program with gcc that casts float to int this is the
> behaviour you get. All I'm doing is make this consistent for all platforms
> by
> explicitly setting the cw.
> This is actually mirrored by id introducing the snapvector trap call for
> VMs.
> They originally used a simple float cast for snapvector (fld + fistp) and
> it
> would result in different behaviour on different operating
> systems/platforms.
>
>
Yes, round to zero is the standard rounding mode for ftol and float-to-int
conversion in C, but round-to-nearest is the standard rounding mode for
every other floating point operation (
http://www.gnu.org/s/hello/manual/libc/Rounding.html).


Unfortunately the x87 has to change rounding modes to implement ftol with
round-to-zero semantics and this was a *very* expensive operation on early
pentiums. I can't find the link now, but I remember that Intel advised to
avoid the default ftol implementation and use a fistp based round-to-nearest
assembly function instead.


> > Q_ftol is only used in places where precision is
> > not important (e.g. converting colors from float to byte), but the
> default
> > rounding mode affects every fp operation, so it could influence the game
> > physics etc.
>
> The game physics are calculated by the VMs, and these have special
> conversion
> functions.
>

The rounding mode affects all floating point operations, not just float to
int conversion. And round-to-zero has a higher relative error than round to
nearest.


>
> > This is especially bad as you have to switch the mxcsr back to
> > round-to-nearest for the snapvector function. Control register changes
> are
> > very expensive and should be avoided.
>
> This is what's been happening for 10 years in quake3 already, if you check
> the
> code before I applied those changes.
>

Oh. The original SnapVector assembly loaded the control word to set
round-to-nearest, but that should have been the default mode anyway. I
somehow remembered that SnapVector was using the default rounding mode, but
I didn't think that it explicitly loaded the control word. But if we have
round-to-nearest by default we can skip the control word load.


> > Speaking of snapvector, I also would avoid the maskmovdqu instruction as
> it
> > writes directly to memory, bypassing the cache. When you re-read the
> vector
> > later you will always encounter a cache-miss, so I think it would be
> better
> > to write the result back with moveups, preserving the original 4th value
> :
> >
> >   qsnapvectorsse PROC
> > movaps xmm1, ssemask ; initialize the mask register
> > movups xmm0, [rcx] ; here is stored our vector. Read 4 values in one go
> > movaps xmm2, xmm0 ; keep a copy of the original data
> > andps xmm0, xmm1 ; set the fourth value to zero in xmm0
> > andnps xmm1, xmm2 ; set values one, two and three to zero in xmm1
> > cvtps2dq xmm0, xmm0 ; convert 4 single fp to int
> > cvtdq2ps xmm0, xmm0 ; convert 4 int to single fp
> > orps xmm0, xmm1 ; combine all 4 values again
> > movups [rcx], xmm0  ; write 3 rounded and 1 unchanged values back to
> memory
> > ret
> >   qsnapvectorsse ENDP
> >
> > (assuming the global round mode is round-to-nearest ofc).
>
> Hmm, yeah. I guess that would work too. Did you do performance comparisons
> via
> things like profiling?
>
>
I have run both versions of snapvector 10 million times in a loop and
measured them with CodeAnalyst:
The maskmovdqu version: CPU Clocks 278678, IPC 0.06, DC miss rate 0,02
The andps/orps version: CPU Clocks 61028, IPC 0.36, DC miss rate 0

(Both versions without any control-word loads/stores).

Overall I don't think that this makes a noticeable difference either for the
performance nor the precision of the game. But when it doesn't make a
difference I'd always stick to the standard settings.

If SSE is available, we can use the cvtt* opcodes for fast truncating
conversions and still keep the standard round-to-nearest mode for everything
else. If we have no SSE we can chose the "correct" or "fast" float to int
conversion, but I'd prefer to keep the default rounding mode for all the
other operations.

--
> Thilo Schulz
>
> _______________________________________________
> ioquake3 mailing list
> ioquake3@lists.ioquake.org
> http://lists.ioquake.org/listinfo.cgi/ioquake3-ioquake.org
> By sending this message I agree to love ioquake3 and libsdl.
>

_______________________________________________
ioquake3 mailing list
ioquake3@lists.ioquake.org
http://lists.ioquake.org/listinfo.cgi/ioquake3-ioquake.org
By sending this message I agree to love ioquake3 and libsdl.

Re: [ioquake3] Dual protocol support and a question

Reply via email to