On Mon, Jan 7, 2019 at 3:26 PM Henrik Bengtsson <[email protected]> wrote:
> > 1. To achieve fully numerically reproducible RNGs in way that is > *invariant to the number of workers* (amount of chunking), I think the > only solution is to pregenerated RNG seeds (using > parallel::nextRNGStream()) for each individual iteration (element). > In other words, if a worker will process K elements, then the main R > process needs to generate K RNG seeds and pass those along to the > work. I use this approach for future.apply::future_lapply(..., > future.seed = TRUE/<initial_seed>), which then produce identical RNG > results regardless of backend and amount of chunking. In the past, I > think I've seen Martin suggesting something similar as a manual > approach to some users. > > 2. The above approach is obviously expensive, especially when there > are a large number of elements to iterate over. Because of this I'm > thinking providing an option to use only one RNG seed per worker > (which is the common approach used elsewhere) > [https://github.com/HenrikBengtsson/future.apply/issues/20]. This > won't be invariant to the number of workers, but it "should" still be > statistically sound. This approach will give reproducible RNG results > given the same initial seed and the same amount of chunking. > > 3. For algorithms which do not rely on RNG, we can ignore both of the > above. The problem is that it's not always known to the > user/developer which methods depend on RNG or not. The above 'RNG > tracker' helps to identify some, but things might also change over > time. I believe there's room for automating this in one way or the > other. For instance, having a way to declare a function being > dependent on RNG or not could help. Static code inspection could also > do it, e.g. when an R package is built and it could be part of the R > CMD checks to validate. > > 4. Are there other approaches? > I don't suppose it's possible to quickly determine via static analysis whether a piece of code uses the RNG? [[alternative HTML version deleted]] _______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
