Adding to the above, in the specific context of genetic programming, I'd suggest dividing the population into N subsets, one per core, and trialling them in parallel to generate fitness scores; then parallel-mergesort to get a ranked order; then (apply map vector (partition (/ num-to-keep num-cores) (take num-to-keep sorted-population))) to cull all but the best num-to-keep and distribute them across N new subsets with each subset a representative sample of the range of fitness scores (so any correlation between trial speed and fitness won't make some populations slow and stop), then apply any recombination/crossovers to generate, say, M new genomes in each subset (parallel, uses the static data from the previous round and per-thread PRNGs to decide which pairs of survivors to use to make offspring added to that thread's subpopulation), then apply mutation (parallel, each thread mutates its own subpopulation), then next trial... No thread-order-of-action dependencies this way. Snapshot once a round by saving the subpopulations and per-thread PRNG internal states right before each fitness trial phase. Snapshot should evolve deterministically if restored with the same values for num-to-keep and num-cores and whatever other parameters. Last snapshot before crash should crash the same way every time.
On Sat, Dec 28, 2013 at 2:56 PM, Cedric Greevey <cgree...@gmail.com> wrote: > On Sat, Dec 28, 2013 at 12:45 PM, Lee Spector <lspec...@hampshire.edu>wrote: > >> >> On Dec 28, 2013, at 11:27 AM, Cedric Greevey wrote: >> > >> > It helps to go with the "functional, immutable" flow, in which case if >> you get an unwanted exception it should *usually* have bubbled up from some >> failing test. Add a dump-locals where suggested by the stack trace and >> rerun the failing test and voila! That should do it for almost all >> non-exogenous exceptions, leaving mainly things like network timeouts and >> other wonkiness caused by factors outside of your code (and, often, outside >> of your control anyway). >> >> You've given me some interesting things to think about re: the role of >> testing, but I think that it may be hard to map your approach directly on >> to the kind of work that I do. >> >> I often work with stochastic simulations which run for days and for which >> repeatability is hard to engineer, especially in a multicore context. >> There's a lot of unpredictable dynamism and usually code is generated and >> run dynamically (and mutated and recombined; this is "genetic >> programming"). Even if you code functionally and immutably (which I try to >> do, to a reasonable extent), and stamp out all nondeterminism (which would >> be a pain), it may take days to re-create a situation. >> > > Your requirements are unusual. > > That being said, you might want to consider: > > 1. Using a PRNG with recordable seed, and sane concurrency semantics, to > achieve repeatability -- rerun with same seed to get identical replay of > events. > > 2. If crashes are happening after days, add snapshotting -- some ability > to save the state of the whole simulation from time to time (including > current PRNG state). Use the last snapshot before a crash to investigate > the crash. Requires item 1, above, for rerunning from the same snapshot to > produce unvarying results. > > I'd suggest using a ref world, with a periodically waking thread that does > a (spit (dosync (dump-all-the-refs-to-some-data-structure))) or something. > (If retries become a big problem you'll need to add more coordination, > maybe using core.async to get everything else to take a breather during > each state dump.) You also need order-independence (which suggests a > deterministic breaking up of the world into the domains of different > threads, with defined interaction channels and times, and a separate PRNG > per thread -- I'd suggest a state-dumpable Mersenne Twister instance per > thread, seeded at startup using values from java.util.Random, itself seeded > with a known startup seed). > > -- -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.