On 02/08/2015 09:33 AM, Dirk Eddelbuettel wrote:

On 7 February 2015 at 19:52, otoomet wrote:
| random numbers.   For instance, can I be sure that
| set.seed(0); print(runif(1)); print(rnorm(1))
| will always print the same numbers, also in the future version of R?  There

Yes, pretty much.

This is nearly correct. The user could change the uniform or normal generator, since there are options other than the defaults, which would mean the result would be different. And obviously if they changed print precision then the printed result may be truncated differently.

I think you could prepare for future versions of R by saving information about the generators you are using. The precedent has already been set (R-1.7.0) that the default could change if there is a good reason. A good reason might be that the RNG is found not to be so good relative to others that become available. But I think the old generator would continue to be available, so people can reproduce old results. (Package setRNG has some utilities to help save and reset, but there is nothing especially difficult or fancy, just a few details that need to be remembered.)

I've been lurking here over fifteen years, and while I am getting old and
forgetful I can remember exactly one such change where behaviour was changed,
and (one of the) generators was altered---if memory serves in the earlier
days of R 1.* days . [ Goes digging...] Yes, see `help(RNGkind)` which
details that R 1.7.0 made a change when "Buggy Kinderman-Ramage" was added as
the old value, and "Kinderman-Ramage" was repaired.  There once was a similar
fix in the very early days of the Mersenne-Twister which is why the GNU GSL
has two variants with suffixes _1998 and _1998.

I seem to recall a bit of change around R-0.49 but old and forgetful would cover this too. For me, a bigger change was an unadvertised change in Splus - they compiled against a different math library at some point. This changed the lower bits in results, mostly insignificant but accumulated simulation results could amount to something fairly important. The amount of time I spent trying to find why results would not reproduce was one of my main motivations for starting to use R.

So your issue seems like pilot error to me:  don't attach the parallel package
if you do not plan to work in parallel.  But "do if you do", and see its fine
vignette on how it provides you reproducibility for multiple RNG streams.

In general, you can very much trust R (and R Core) in these matters.

Dirk

On 02/08/2015 09:40 AM, Gábor Csárdi wrote:> On Sat, Feb 7, 2015 at
> I don't know if there is intention to keep this reproducible across R
> versions, but it is already not reproducible across platforms (with
>the same R version):
> http://stackoverflow.com/questions/21212326/floating-point-arithmetic-and-reproducibility

The situation is better in some respects, and worse in others, than what is described on stackoverflow. I think the point is made pretty well there that you should not be trying to reproduce results beyond machine precision. My experience is that you can compare within a fuzz of 1e-14 usually, even across platforms. (The package setRNG on CRAN has a function random.number.test() which is run in the package's tests/ and makes uniform and normal comparisons to 1e-14. It has passed checks on all R platforms since 2004. Actual, the checks have been done since about 1995 but they were part of package dse earlier.) If you accumulate lots of lower order parts (eg sum(simulated - true) in a long monte-carlo) then the fuzz may need to get much larger, especially comparing across platforms. And you will have trouble with numerically unstable calculations. Once-upon-a-time I was annoyed by this, but then I realized that it was better not to do unstable calculations.

In addition to not being reproducible beyond machine precision across R versions and across platforms, you can really not be guaranteed even on the same platform and same version of R. You may get different results if you upgrade the OS and there has been a change in the math libraries. In my experience this happens rather often. I don't think there is any specific 32 vs 64 bit issue, but math libraries sometimes do things a bit differently on different processors (eg processor bug fixes) so you can occasionally get differences with everything the same except the hardware.


On 02/07/2015 10:52 PM, otoomet wrote:
> It turned out that this is because package "parallel", buried deep
> in my dependencies, calls runif() during it's initialization and
> in this way changes the random number sequence.

Guessing a bit about what you are saying: 1/you set the random seed 2/you did some things which included loading package parallel 3/you ran some things for which you expected to get results comparable to some previous run when you did 1/ and 2/ in the reverse order.

If I understand this correctly, I suggest you always do everything exactly the same after you set the seed. There are lots of things that could generate random numbers without you really knowing. Thus, it is usually better to set the seed immediately before you start doing anything where you want the seed to have a known state. (There is an even better suggestion in the somewhat dated vignette with package setRNG.)

Finally, if you do intend to use parallel sometimes then you have additional considerations. You would like to get the same results no matter how many machines you are using. This may place some constraints on the generators you use, not all are equally easy to use in parallel. So if you are hoping to get the same results in parallel as you get on a single machine then you better start out using generators on the single machine that you will be able to use in parallel.

Paul

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to