Two optimizations I have not seen mentioned so far; don't be so Groovy ;-) 1. Replace the the Map<> with an array of primitive ints. Why use an integer as a key into a hash map when it can be used as an array index? int[] results = new int[19] // since we need to index values 3..18
2. Replace the N.times{} or (1..N).each{} loops with a good old fashioned for(int i=0;i<N;i++) loop. I was actually surprised how much of an improvement this made if @CompileStatic was not used. For statically compiled code this didn't really make a difference, but for dynamic code the speed up is huge. Of course, the big winner is using @CompileStatic, which when combined with using an array results in a ~10x speed up. For my tests I basically took the code posted by John and James and ran 50 rounds of 1M iterations. Here are some representative total times on my oldish quad core MacBook Pro: @CompileDynamic Map + N.times = 17 seconds Map + for-loop = 7 seconds Array + N.times = 13 seconds Array + for-loop = 3 seconds @CompileStatic Map + N.times = 3.4 seconds Map + for-loop = 3.0 seconds Array + N.times = 1.4 seconds Array + for-loop = 1.4 seconds Interesting, dynamically compiled code using a primitive array and a for-loop performed just as well as statically compiled code doing it the "Groovy way". - Keith > On Mar 28, 2017, at 4:25 PM, Paul Moore <p.f.mo...@gmail.com> wrote: > > I'm very much a newbie with Groovy, so I apologise in advance if this > is not the right place for questions like this. I couldn't find > anywhere else that looked like a better option - if there is somewhere > I should have asked, feel free to redirect me. > > I want to write a simulation script using Groovy - this is something > of a hobby challenge for me, I have a friend who has done a similar > task in C++, and I'm looking for a more user-friendly language to > write the code, while not losing too much performance over the C > version. > > The code is basically to simulate "a game". The particular game is > defined by the user, as a function that generates scores. The program > runs the game many times, and summarises the distribution of the > results. Basically, a Monte Carlo simulation. My current code in > Groovy for this, using "roll 3 dice and add up the results" as the > target game, looks as follows: > > @Grapes( > @Grab(group='org.apache.commons', module='commons-math3', version='3.6.1') > ) > import org.apache.commons.math3.random.MersenneTwister > > def benchmark = { closure -> > start = System.currentTimeMillis() > closure.call() > now = System.currentTimeMillis() > now - start > } > > rng = new MersenneTwister() > > int roll() { > rng.nextInt(6) + rng.nextInt(6) + rng.nextInt(6) + 3 > } > > int N = 1000000 > def results = [:] > > def time = benchmark { > N.times { > int n = roll() > results[n] = results.containsKey(n) ? results[n] + 1 : 1 > } > } > > println "Took ${time/100} sec" > for (e in results.sort()) { > println "$e.key: $e.value" > } > > This does exactly what I want, but takes about 5 seconds to run the > simulation on my PC. My friend's C++ code runs a similar simulation in > about 0.1 second. That's a massive penalty for Groovy, and likely > means that for more realistic simulations (which would be a lot more > complex than 3d6!) I wouldn't even be close to competitive. > > The code above is completely unoptimised. I know that Groovy's dynamic > programming features can introduce some overhead, but I also get the > impression from the documentation that by careful use of exact types, > and similar techniques, this can be speeded up a lot (the docs claim > potentially better than C performance in some cases). > > What should I be looking at to optimise the above code? The areas I > can think of are: > > 1. The RNG. I assume that the apache commons code is pretty efficient, > though. I do want a reasonably decent RNG, and I'd heard that the JVM > RNG is not sufficient for simulation. For now I'm assuming that this > is sufficiently good. > 2. The roll() function. This is the core of the inner loop, and likely > the big bottleneck. I've declared the type, which I guess is the first > step, but I don't know what else I can do here. I tried a > CompileStatic annotation, but that gave me errors about referencing > the rng variable. I'm not sure what that implies - is my code doing > something wrong in how it references the global rng variable? > 3. Collecting the results in a map is likely not ideal. Is there a > better data structure I should be using? I basically want to be able > to count how many times each result appears - results will be integers > (in certain cases I might want non-integers but I can handle them > exceptionally) but I don't necessarily know the range in advance (so > I'd avoid a static array unless it gives significant performance > benefits - when I did a quick test, I got about a second faster > runtime, noticeable, but not enough to get me anywhere near my sub-1 > second target) > > I tried using the GProf profiler to see if that shed any light on what > I should do. When I ran with a realistic number of iterations it took > forever and then failed with an out of memory error. Dropping it to > 10000 iterations, I got > > % cumulative self self total self total > self total > time seconds seconds calls ms/call ms/call min ms min ms > max ms max ms name > 34.5 0.04 0.04 10000 0.00 0.00 0.00 0.00 > 0.91 1.13 blackpool$_run_closure2$_closure3.roll > 18.5 0.06 0.02 39984 0.00 0.00 0.00 0.00 > 0.53 0.53 java.lang.Integer.plus > 15.9 0.08 0.02 10000 0.00 0.01 0.00 0.00 > 0.28 1.26 blackpool$_run_closure2$_closure3.doCall > 11.0 0.10 0.01 30000 0.00 0.00 0.00 0.00 > 0.36 0.36 org.apache.commons.math3.random.MersenneTwister.nextInt > 5.1 0.10 0.00 10000 0.00 0.00 0.00 0.00 > 0.19 0.19 java.util.LinkedHashMap.containsKey > 4.4 0.11 0.00 10000 0.00 0.00 0.00 0.00 > 0.02 0.02 java.util.LinkedHashMap.putAt > 4.0 0.12 0.00 1 5.25 125.62 5.25 125.62 > 5.25 125.62 java.lang.Integer.times > 3.9 0.12 0.00 9984 0.00 0.00 0.00 0.00 > 0.02 0.02 java.util.LinkedHashMap.getAt > 2.2 0.12 0.00 1 2.85 128.48 2.85 128.48 > 2.85 128.48 blackpool$_run_closure2.doCall > > But I don't really know how to interpret that. (Also, am I somehow > using GProf wrongly? It seems like it shouldn't run out of memory > profiling a 5-second program run...) > > Can anyone offer any advice on what I should be looking at here? > > Thanks, > Paul ---------------------- Keith Suderman Research Associate Department of Computer Science Vassar College, Poughkeepsie NY suder...@cs.vassar.edu