Thanks for teaching me how to set a seed for each job! On Wed, Jan 2, 2019 at 9:45 AM Martin Morgan <mtmorgan.b...@gmail.com> wrote:
> I'll back-track on my advice a little, and say that the right way to > enable the user to get reproducible results is to respect the setting the > user makes outside your function. So for > > your = function() > unlist(bplapply(1:4, rnorm)) > > The user will > > register(MulticoreParam(2, RNGseed=123)) > your() > > to always produces the identical result. > > Following Aaron's strategy, the R-level approach to reproducibility might > be along the lines of > > - tell the user to set parallel::RNGkind("L'Ecuyer-CMRG") and set.seed() > - In your function, generate seeds for each job > > n = 5; seeds <- vector("list", n) > seeds[[1]] = .Random.seed # FIXME fails if set.seed or random nos. > have not been generated... > for (i in tail(seq_len(n), -1)) seeds[[i]] = nextRNGStream(seeds[[i - > 1]]) > > - send these, along with the job, to the workers, setting .Random.seed on > each worker > > bpmapply(function(i, seed, ...) { > oseed <- get(".Random.seed", envir = .GlobalEnv) > on.exit(assign(".Random.seed", oseed, envir = .GlobalEnv)) > assign(".Random.seed", seed, envir = .GlobalEnv) > ... > }, seq_len(n), seeds, ...) > > The use of L'Ecuyer-CMRG and `nextRNGStream()` means that the streams on > each worker are independent. Using on.exit means that, even on the worker, > the state of the random number generator is not changed by the evaluation. > This means that even with SerialParam() the generator is well-behaved. I > don’t know how BiocCheck responds to use of .Random.seed, which in general > would be a bad thing to do but in this case with the use of on.exit() the > usage seems ok. > > Martin > > > On 12/31/18, 3:17 PM, "Lulu Chen" <luluc...@vt.edu> wrote: > > Hi Martin, > > > Thanks for your help. But setting different number of workers will > generate different results: > > > > unlist(bplapply(1:4, rnorm, BPPARAM=SnowParam(1, RNGseed=123))) > [1] 1.0654274 -1.2421454 1.0523311 -0.7744536 1.3081934 > -1.5305223 1.1525356 0.9287607 -0.4355877 1.5055436 > > unlist(bplapply(1:4, rnorm, BPPARAM=SnowParam(2, RNGseed=123))) > [1] -0.9685927 0.7061091 1.4890213 -0.4094454 0.8909694 > -0.8653704 1.4642711 1.2674845 -0.2220491 2.4505322 > > unlist(bplapply(1:4, rnorm, BPPARAM=SnowParam(3, RNGseed=123))) > [1] -0.96859273 -0.40944544 0.89096942 -0.86537045 1.46427111 > 1.26748453 -0.48906078 0.43304237 -0.03195349 > [10] 0.14670372 > > unlist(bplapply(1:4, rnorm, BPPARAM=SnowParam(4, RNGseed=123))) > [1] -0.96859273 -0.40944544 0.89096942 -0.48906078 0.43304237 > -0.03195349 -1.03886641 1.57451249 0.74708204 > [10] 0.67187201 > > > > Best, > Lulu > > > > On Mon, Dec 31, 2018 at 1:12 PM Martin Morgan <mtmorgan.b...@gmail.com> > wrote: > > > The major BiocParallel objects (SnowParam(), MulticoreParam()) and use > of bplapply() allow fully repeatable randomizations, e.g., > > > library(BiocParallel) > > unlist(bplapply(1:4, rnorm, BPPARAM=MulticoreParam(RNGseed=123))) > [1] -0.96859273 -0.40944544 0.89096942 -0.48906078 0.43304237 > -0.03195349 > [7] -1.03886641 1.57451249 0.74708204 0.67187201 > > unlist(bplapply(1:4, rnorm, BPPARAM=MulticoreParam(RNGseed=123))) > [1] -0.96859273 -0.40944544 0.89096942 -0.48906078 0.43304237 > -0.03195349 > [7] -1.03886641 1.57451249 0.74708204 0.67187201 > > unlist(bplapply(1:4, rnorm, BPPARAM=SnowParam(RNGseed=123))) > [1] -0.96859273 -0.40944544 0.89096942 -0.48906078 0.43304237 > -0.03195349 > [7] -1.03886641 1.57451249 0.74708204 0.67187201 > > The idea then would be to tell the user to register() such a param, or > to write your function to accept an argument rngSeed along the lines of > > f = function(..., rngSeed = NULL) { > if (!is.null(rngSeed)) { > param = bpparam() # user's preferred back-end > oseed = bpRNGseed(param) > on.exit(bpRNGseed(param) <- oseed) > bpRNGseed(param) = rngSeed > } > bplapply(1:4, rnorm) > } > > (actually, this exercise illustrates a problem with bpRNGseed<-() when > the original seed is NULL; this will be fixed in the next day or so...) > > Is that sufficient for your use case? > > On 12/31/18, 11:24 AM, "Bioc-devel on behalf of Lulu Chen" < > bioc-devel-boun...@r-project.org on behalf of > luluc...@vt.edu> wrote: > > Dear all, > > I posted the question in the Bioconductor support site ( > > https://support.bioconductor.org/p/116381/ < > https://support.bioconductor.org/p/116381/>) and was suggested to direct > future correspondence there. > > I plan to generate a vector of seeds (provided by users through > argument of > my R function) and use them by set.seed() in each parallel > computation. > However, set.seed() will cause warning in BiocCheck(). > > Someone suggested to re-write code using c++, which is a good > idea. But it > will take me much more extra time to re-write some functions from > other > packages, e.g. eBayes() in limma. > > Hope to get more suggestions from you. Thanks a lot! > > Best, > Lulu > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioc-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/bioc-devel < > https://stat.ethz.ch/mailman/listinfo/bioc-devel> > > > > > > [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel