I checked and I am now convinced that the fault lies in the physics simulator as I tried on other simpler reinforcement learning environments and everything was reproducible, so it is not due to the neural network part (which is already impressive I guess as neural network libraries tend to be quite a mess reproducibility-wise). So it seems that something weird is going on with mujoco, the physics simulator for which we did a package. And it seems that it is the interaction between mujoco and the neural network from pytorch because using random action seems reproducible. I guess this could be due to floating point rounding error, although the difference seems to be huge for this to be rounding error. The computation is quite long so maybe the errors amplify, but I am a bit doubtful about this because I found a complete reproducibility between my laptop and some powerful servers with very different hardware, wouldn't the results be different with very different hardware if the problem was rounding error?
Is there a way to check whether this is due to floating point calculation rounding error? I tried to use Float64 instead of Float 32 and it does not change that I have non-reproducible results (although it changes the value a little bit, in the scale of 10^{-5}). Thanks, Timothée ----- Mail original ----- > De: "Andreas Enge" <andr...@enge.fr> > À: "Ludovic Courtès" <ludovic.cour...@inria.fr> > Cc: "Timothee Mathieu" <timothee.math...@inria.fr>, "Steve George" > <st...@futurile.net>, "Cayetano Santos" > <csant...@inventati.org>, "help-guix" <help-guix@gnu.org> > Envoyé: Mardi 6 Mai 2025 10:30:12 > Objet: Re: Reproducibility of guix shell container across different host OS > Am Tue, May 06, 2025 at 09:26:51AM +0200 schrieb Ludovic Courtès: >> Do you have evidence that the problem is a leak like this? Or could it >> be that the Python code being run is non-deterministic? >> If you run ‘guix shell -CN --no-cwd coreutils’, you can see with ‘ls’ >> etc. that nothing leaks from the host OS (apart of course from the >> kernel). > > Or maybe the hardware "leaks"? Are the two machines exactly identical, > in particular, do they have the exact same processor? Since the > differences involve floating point computations, I would not be > surprised if the precise processor architecture made a difference. > > Someone mentioned the IEEE-754 standard in the thread, which mandates > that basic arithmetic operations follow a precise, deterministic > semantics, but not necessarily trigonometric functions. > > Also, if I remember well, special flags are required to make GCC emit > IEEE conforming code; otherwise the old, but faster x86 80 bit extended > precision built into the processor is used. I have seen a case where > *printing* a variable changed its value, because this meant it would be > moved from an 80 bit processor register to a 64 bit memory location. > Otherwise said, something like the following code: > double x = ...; > if (x!=some value) { > printf ("%f", x); > if (x!=some value) // the same value as above, of course > printf ("0"); > else > printf ("1"); > } > would print x, followed by "1"... > > See this thread: > https://lists.gnu.org/archive/html/guix-devel/2023-03/msg00277.html > and commit 098bd280f82350073e8280e37d56a14162eed09c . > > If you want deterministic, reproducible floating point computations, > I am afraid you would need to use the (comparably slow in low precision) > GNU MPFR and GNU MPC libraries; or use interval arithmetic from FLINT > and replace exact comparisons by looking at intersections of intervals. > > Andreas