I’ll also back-track a bit from my advice in the original support site posting, as it turns out C++11’s <random> is not guaranteed to be reproducible across platforms.
That is to say, the RNG engines are portably defined across implementations, but the distribution classes (that convert the random stream into values of the desired distribution) turn out to be implementation-defined. As such, you can get different distribution values from the same seed with different compilers. Fun, huh? So much for “standard”! So, by using <random>, I ended up trading irreproducibility across workers for irreproducibility across platforms. Switching to boost::random (as provided by the BH package) seems to fix the problem, though one wonders how this was ever allowed to happen in the first place. -A > On 2 Jan 2019, at 14:45, Martin Morgan <[email protected]> wrote: > > I'll back-track on my advice a little, and say that the right way to enable > the user to get reproducible results is to respect the setting the user makes > outside your function. So for > > your = function() > unlist(bplapply(1:4, rnorm)) > > The user will > > register(MulticoreParam(2, RNGseed=123)) > your() > > to always produces the identical result. > > Following Aaron's strategy, the R-level approach to reproducibility might be > along the lines of > > - tell the user to set parallel::RNGkind("L'Ecuyer-CMRG") and set.seed() > - In your function, generate seeds for each job > > n = 5; seeds <- vector("list", n) > seeds[[1]] = .Random.seed # FIXME fails if set.seed or random nos. have > not been generated... > for (i in tail(seq_len(n), -1)) seeds[[i]] = nextRNGStream(seeds[[i - 1]]) > > - send these, along with the job, to the workers, setting .Random.seed on > each worker > > bpmapply(function(i, seed, ...) { > oseed <- get(".Random.seed", envir = .GlobalEnv) > on.exit(assign(".Random.seed", oseed, envir = .GlobalEnv)) > assign(".Random.seed", seed, envir = .GlobalEnv) > ... > }, seq_len(n), seeds, ...) > > The use of L'Ecuyer-CMRG and `nextRNGStream()` means that the streams on each > worker are independent. Using on.exit means that, even on the worker, the > state of the random number generator is not changed by the evaluation. This > means that even with SerialParam() the generator is well-behaved. I don’t > know how BiocCheck responds to use of .Random.seed, which in general would be > a bad thing to do but in this case with the use of on.exit() the usage seems > ok. > > Martin > > > On 12/31/18, 3:17 PM, "Lulu Chen" <[email protected]> wrote: > > Hi Martin, > > > Thanks for your help. But setting different number of workers will > generate different results: > > >> unlist(bplapply(1:4, rnorm, BPPARAM=SnowParam(1, RNGseed=123))) > [1] 1.0654274 -1.2421454 1.0523311 -0.7744536 1.3081934 -1.5305223 > 1.1525356 0.9287607 -0.4355877 1.5055436 >> unlist(bplapply(1:4, rnorm, BPPARAM=SnowParam(2, RNGseed=123))) > [1] -0.9685927 0.7061091 1.4890213 -0.4094454 0.8909694 -0.8653704 > 1.4642711 1.2674845 -0.2220491 2.4505322 >> unlist(bplapply(1:4, rnorm, BPPARAM=SnowParam(3, RNGseed=123))) > [1] -0.96859273 -0.40944544 0.89096942 -0.86537045 1.46427111 > 1.26748453 -0.48906078 0.43304237 -0.03195349 > [10] 0.14670372 >> unlist(bplapply(1:4, rnorm, BPPARAM=SnowParam(4, RNGseed=123))) > [1] -0.96859273 -0.40944544 0.89096942 -0.48906078 0.43304237 > -0.03195349 -1.03886641 1.57451249 0.74708204 > [10] 0.67187201 > > > > Best, > Lulu > > > > On Mon, Dec 31, 2018 at 1:12 PM Martin Morgan <[email protected]> > wrote: > > > The major BiocParallel objects (SnowParam(), MulticoreParam()) and use of > bplapply() allow fully repeatable randomizations, e.g., > >> library(BiocParallel) >> unlist(bplapply(1:4, rnorm, BPPARAM=MulticoreParam(RNGseed=123))) > [1] -0.96859273 -0.40944544 0.89096942 -0.48906078 0.43304237 > -0.03195349 > [7] -1.03886641 1.57451249 0.74708204 0.67187201 >> unlist(bplapply(1:4, rnorm, BPPARAM=MulticoreParam(RNGseed=123))) > [1] -0.96859273 -0.40944544 0.89096942 -0.48906078 0.43304237 > -0.03195349 > [7] -1.03886641 1.57451249 0.74708204 0.67187201 >> unlist(bplapply(1:4, rnorm, BPPARAM=SnowParam(RNGseed=123))) > [1] -0.96859273 -0.40944544 0.89096942 -0.48906078 0.43304237 -0.03195349 > [7] -1.03886641 1.57451249 0.74708204 0.67187201 > > The idea then would be to tell the user to register() such a param, or to > write your function to accept an argument rngSeed along the lines of > > f = function(..., rngSeed = NULL) { > if (!is.null(rngSeed)) { > param = bpparam() # user's preferred back-end > oseed = bpRNGseed(param) > on.exit(bpRNGseed(param) <- oseed) > bpRNGseed(param) = rngSeed > } > bplapply(1:4, rnorm) > } > > (actually, this exercise illustrates a problem with bpRNGseed<-() when the > original seed is NULL; this will be fixed in the next day or so...) > > Is that sufficient for your use case? > > On 12/31/18, 11:24 AM, "Bioc-devel on behalf of Lulu Chen" > <[email protected] on behalf of > [email protected]> wrote: > > Dear all, > > I posted the question in the Bioconductor support site ( > > https://support.bioconductor.org/p/116381/ > <https://support.bioconductor.org/p/116381/>) and was suggested to direct > future correspondence there. > > I plan to generate a vector of seeds (provided by users through > argument of > my R function) and use them by set.seed() in each parallel computation. > However, set.seed() will cause warning in BiocCheck(). > > Someone suggested to re-write code using c++, which is a good idea. > But it > will take me much more extra time to re-write some functions from other > packages, e.g. eBayes() in limma. > > Hope to get more suggestions from you. Thanks a lot! > > Best, > Lulu > > [[alternative HTML version deleted]] > > _______________________________________________ > [email protected] mailing list > > https://stat.ethz.ch/mailman/listinfo/bioc-devel > <https://stat.ethz.ch/mailman/listinfo/bioc-devel> > > > > > > _______________________________________________ > [email protected] mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel _______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
