Re: Optimising a Groovy script

Keith Suderman Wed, 29 Mar 2017 12:55:27 -0700

Two optimizations I have not seen mentioned so far; don't be so Groovy ;-)

1. Replace the the Map<> with an array of primitive ints. Why use an integer as 
a key into a hash map when it can be used as an array index? 
        int[] results = new int[19] // since we need to index values 3..18


2. Replace the N.times{} or (1..N).each{} loops with a good old fashioned 
for(int i=0;i<N;i++) loop. I was actually surprised how much of an improvement 
this made if @CompileStatic was not used.  For statically compiled code this 
didn't really make a difference, but for dynamic code the speed up is huge.

Of course, the big winner is using @CompileStatic, which when combined with 
using an array results in a  ~10x speed up.

For my tests I basically took the code posted by John and James and ran 50 
rounds of 1M iterations.  Here are some representative total times on my oldish 
quad core MacBook Pro:

@CompileDynamic
Map + N.times = 17 seconds
Map + for-loop = 7 seconds
Array + N.times = 13 seconds
Array + for-loop = 3 seconds

@CompileStatic
Map + N.times = 3.4 seconds
Map + for-loop = 3.0 seconds
Array + N.times = 1.4 seconds
Array + for-loop = 1.4 seconds


Interesting, dynamically compiled code using a primitive array and a for-loop 
performed just as well as statically compiled code doing it the "Groovy way".

- Keith


> On Mar 28, 2017, at 4:25 PM, Paul Moore <p.f.mo...@gmail.com> wrote:
> 
> I'm very much a newbie with Groovy, so I apologise in advance if this
> is not the right place for questions like this. I couldn't find
> anywhere else that looked like a better option - if there is somewhere
> I should have asked, feel free to redirect me.
> 
> I want to write a simulation script using Groovy - this is something
> of a hobby challenge for me, I have a friend who has done a similar
> task in C++, and I'm looking for a more user-friendly language to
> write the code, while not losing too much performance over the C
> version.
> 
> The code is basically to simulate "a game". The particular game is
> defined by the user, as a function that generates scores. The program
> runs the game many times, and summarises the distribution of the
> results. Basically, a Monte Carlo simulation. My current code in
> Groovy for this, using "roll 3 dice and add up the results" as the
> target game, looks as follows:
> 
> @Grapes(
>    @Grab(group='org.apache.commons', module='commons-math3', version='3.6.1')
> )
> import org.apache.commons.math3.random.MersenneTwister
> 
> def benchmark = { closure ->
>  start = System.currentTimeMillis()
>  closure.call()
>  now = System.currentTimeMillis()
>  now - start
> }
> 
> rng = new MersenneTwister()
> 
> int roll() {
>    rng.nextInt(6) + rng.nextInt(6) + rng.nextInt(6) + 3
> }
> 
> int N = 1000000
> def results = [:]
> 
> def time = benchmark {
>    N.times {
>        int n = roll()
>        results[n] = results.containsKey(n) ? results[n] + 1 : 1
>    }
> }
> 
> println "Took ${time/100} sec"
> for (e in results.sort()) {
>    println "$e.key: $e.value"
> }
> 
> This does exactly what I want, but takes about 5 seconds to run the
> simulation on my PC. My friend's C++ code runs a similar simulation in
> about 0.1 second. That's a massive penalty for Groovy, and likely
> means that for more realistic simulations (which would be a lot more
> complex than 3d6!) I wouldn't even be close to competitive.
> 
> The code above is completely unoptimised. I know that Groovy's dynamic
> programming features can introduce some overhead, but I also get the
> impression from the documentation that by careful use of exact types,
> and similar techniques, this can be speeded up a lot (the docs claim
> potentially better than C performance in some cases).
> 
> What should I be looking at to optimise the above code? The areas I
> can think of are:
> 
> 1. The RNG. I assume that the apache commons code is pretty efficient,
> though. I do want a reasonably decent RNG, and I'd heard that the JVM
> RNG is not sufficient for simulation. For now I'm assuming that this
> is sufficiently good.
> 2. The roll() function. This is the core of the inner loop, and likely
> the big bottleneck. I've declared the type, which I guess is the first
> step, but I don't know what else I can do here. I tried a
> CompileStatic annotation, but that gave me errors about referencing
> the rng variable. I'm not sure what that implies - is my code doing
> something wrong in how it references the global rng variable?
> 3. Collecting the results in a map is likely not ideal. Is there a
> better data structure I should be using? I basically want to be able
> to count how many times each result appears - results will be integers
> (in certain cases I might want non-integers but I can handle them
> exceptionally) but I don't necessarily know the range in advance (so
> I'd avoid a static array unless it gives significant performance
> benefits - when I did a quick test, I got about a second faster
> runtime, noticeable, but not enough to get me anywhere near my sub-1
> second target)
> 
> I tried using the GProf profiler to see if that shed any light on what
> I should do. When I ran with a realistic number of iterations it took
> forever and then failed with an out of memory error. Dropping it to
> 10000 iterations, I got
> 
> %    cumulative   self            self     total    self    total
> self    total
> time   seconds    seconds  calls  ms/call  ms/call  min ms  min ms
> max ms  max ms  name
> 34.5        0.04     0.04  10000     0.00     0.00    0.00    0.00
> 0.91    1.13  blackpool$_run_closure2$_closure3.roll
> 18.5        0.06     0.02  39984     0.00     0.00    0.00    0.00
> 0.53    0.53  java.lang.Integer.plus
> 15.9        0.08     0.02  10000     0.00     0.01    0.00    0.00
> 0.28    1.26  blackpool$_run_closure2$_closure3.doCall
> 11.0        0.10     0.01  30000     0.00     0.00    0.00    0.00
> 0.36    0.36  org.apache.commons.math3.random.MersenneTwister.nextInt
> 5.1        0.10     0.00  10000     0.00     0.00    0.00    0.00
> 0.19    0.19  java.util.LinkedHashMap.containsKey
> 4.4        0.11     0.00  10000     0.00     0.00    0.00    0.00
> 0.02    0.02  java.util.LinkedHashMap.putAt
> 4.0        0.12     0.00      1     5.25   125.62    5.25  125.62
> 5.25  125.62  java.lang.Integer.times
> 3.9        0.12     0.00   9984     0.00     0.00    0.00    0.00
> 0.02    0.02  java.util.LinkedHashMap.getAt
> 2.2        0.12     0.00      1     2.85   128.48    2.85  128.48
> 2.85  128.48  blackpool$_run_closure2.doCall
> 
> But I don't really know how to interpret that. (Also, am I somehow
> using GProf wrongly? It seems like it shouldn't run out of memory
> profiling a 5-second program run...)
> 
> Can anyone offer any advice on what I should be looking at here?
> 
> Thanks,
> Paul

----------------------
Keith Suderman
Research Associate
Department of Computer Science
Vassar College, Poughkeepsie NY
suder...@cs.vassar.edu

Re: Optimising a Groovy script

Reply via email to