Just because it has not been mentioned before, Floating Point Units on CPUs are complex beasts that can carry state and other fun stuff.
I believe most common instructions like addition and multiplication should be deterministic, but trigonometric functions might not be reproducible across different cpus. Or even across different FPU-initialization code. But i'm way off my expertise in this, so if someone knows more i'd love to be corrected.