On Fri, Mar 02, 2012 at 10:36:14AM +0100, Karl Forner wrote: [...] > Hello, > I would be also in favor for using multiple seeds based on (seed, > task_number) for convenience (i.e. avoiding storing the seeds) > and with the possibility of having a dynamic number of tasks, but I am mot > sure it is theoretically correct. > But I can refer you to this article: > http://www.agner.org/random/ran-instructions.pdf , section 6.1 > where the author states: > > For example, if we make 100 streams of 10^10 random numbers each from an > > SFMT > > generator with cycle length ρ = 2^11213, we have a probability of overlap > > p ≈ 10^3362. > > > > What do you think ? I am very concerned by the correctness of this approach > so would appreciate any advice on that matter.
Hi. If correctness is crucial, then the literature on cryptography provides the best answers. At the current state of knowledge, it is not possible to prove that a generator is undistinguishable from truly random numbers in a mathematical sense. The problem is that if we can prove this for any generator, then the proof also implies P \not= NP. This is an open problem, so proving that any generator is correct in the sense of undistinguishability is also open. The best, what we can have, is a generator, which cannot be distinguished from truly random numbers using the known methods, although experts on cryptography tried to find such a method intensively. An example of such a generator is Fortuna generator using AES, which is available at CRAN as "randaes". http://en.wikipedia.org/wiki/Fortuna_PRNG The same can be said about initializations. Initialization is a random number generator, whose output is used as the initial state of some other generator. There is no proof that a particular initialization cannot be distinguished from truly random numbers in a mathematical sense for the same reason as above. A possible strategy is to use a cryptographically strong hash function for the initialization. This means to transform the seed to the initial state of the generator using a function, for which we have a good guarantee that it produces output, which is computationally hard to distinguish from truly random numbers. For this purpose, i suggest to use the package rngSetSeed provided currently at http://www.cs.cas.cz/~savicky/randomNumbers/ It is based on AES and Fortuna similarly as "randaes", but these components are used only for the initialization of Mersenne-Twister. When the generator is initialized, then it runs on its usual speed. In the notation of http://www.agner.org/random/ran-instructions.pdf using rngSetSeed for initialization of Mersenne-Twister is Method 4 in Section 6.1. I appreciate comments. Petr Savicky. P.S. I included some more comments on the relationship of provably good random number generators and P ?= NP question to the end of the page http://www.cs.cas.cz/~savicky/randomNumbers/ ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel