Sorry,

I somehow only now got to read more of your email. Here's my reply:

On Thursday, 14. July 2011 11:35:29 Matthias Bentrup wrote:
> Also regarding rev 2052, I think that changing the default rounding mode
> from round-to-nearest to round-to-zero should be avoided, as it is less
> precise and round-to-nearest is in fact what all libraries expect.

Actually, I always thought round-towards-zero behaviour is standard? When 
compiling a test program with gcc that casts float to int this is the 
behaviour you get. All I'm doing is make this consistent for all platforms by 
explicitly setting the cw.
This is actually mirrored by id introducing the snapvector trap call for VMs. 
They originally used a simple float cast for snapvector (fld + fistp) and it 
would result in different behaviour on different operating systems/platforms.

> Q_ftol is only used in places where precision is
> not important (e.g. converting colors from float to byte), but the default
> rounding mode affects every fp operation, so it could influence the game
> physics etc.

The game physics are calculated by the VMs, and these have special conversion 
functions.

> This is especially bad as you have to switch the mxcsr back to
> round-to-nearest for the snapvector function. Control register changes are
> very expensive and should be avoided.

This is what's been happening for 10 years in quake3 already, if you check the 
code before I applied those changes.

> Speaking of snapvector, I also would avoid the maskmovdqu instruction as it
> writes directly to memory, bypassing the cache. When you re-read the vector
> later you will always encounter a cache-miss, so I think it would be better
> to write the result back with moveups, preserving the original 4th value :
> 
>   qsnapvectorsse PROC
> movaps xmm1, ssemask ; initialize the mask register
> movups xmm0, [rcx] ; here is stored our vector. Read 4 values in one go
> movaps xmm2, xmm0 ; keep a copy of the original data
> andps xmm0, xmm1 ; set the fourth value to zero in xmm0
> andnps xmm1, xmm2 ; set values one, two and three to zero in xmm1
> cvtps2dq xmm0, xmm0 ; convert 4 single fp to int
> cvtdq2ps xmm0, xmm0 ; convert 4 int to single fp
> orps xmm0, xmm1 ; combine all 4 values again
> movups [rcx], xmm0  ; write 3 rounded and 1 unchanged values back to memory
> ret
>   qsnapvectorsse ENDP
> 
> (assuming the global round mode is round-to-nearest ofc).

Hmm, yeah. I guess that would work too. Did you do performance comparisons via 
things like profiling?

-- 
Thilo Schulz

Attachment: signature.asc
Description: This is a digitally signed message part.

_______________________________________________
ioquake3 mailing list
ioquake3@lists.ioquake.org
http://lists.ioquake.org/listinfo.cgi/ioquake3-ioquake.org
By sending this message I agree to love ioquake3 and libsdl.

Reply via email to