Ok, my final solution is to add the D3DCREATE_FPU_PRESERVE flag. It didn't harm performance in a noticeable way at all. I was under the impression SSE would be affected by this, too. Additionally I was under the impression that float precision would suffice for time.time(). Obviously I was blatantly wrong :-) Thanks to Gabriel, Ross and Roel for commenting on this and sharing their insights!
They should really make the fpu preserve flag the default. It just causes very sneaky bugs. -Matthias -- http://mail.python.org/mailman/listinfo/python-list