Dear Clojurians, After my obsessed stint with object orientation, I went on to a new obsessed stint with basic functional programming with the hope of converting a nice Java boid simulation written in the popular Processing pedagological tool. I would like to find out if anyone has pointers on improving the performance of a simulation (and yes, I've read the performance threads/docs, I have notes below about what I tried). For reference a nice small example of the flocking algorithm I tried to reimplement in Clojure (along with code) can be found here:
http://www.shiffman.net/itp/classes/nature/week05_s07/flocking/ The technique was first described by Craig Reynolds in the 1980s and has since then made it's way into many contemporary games. The algorithm is interesting in that it's fairly computationally intensive. Each boid's motion is determined by calculating it's distance from every other boid. If the other boids are within a threshold, 3 different vector values (cohesion, alignment, separation) are calculated based on the location and velocites from all other boids that are within the said neighborhood. There's a lot of interation, 67500 (150x150x3) iterations for 150 boids per update/render loop. When running this in the Processing app, you can calculate and render 150 boids in about 6-8ms (new Macbook Pro 2.4ghz). I thought this little project would be a good exericise to get into deeper Clojure, and to put aside my OO leanings. First pass, I avoided premature optimization. I used only mapping operations and Clojure's built in vectors. This of course was horribly slow but I just wanted to see how nicely the algorithm could be expressed. Personally I think the algorithm is far more clear in the Clojure-centric version. However this version was awful in terms of performance. About 100-110ms to calculate all the new positions for 150 boids. http://github.com/swannodette/clojure-stuff/blob/a3b35bdd1254763bf3d535139f7ddab9f929fd3f/flocking.clj I then proceeded to read about how something like this might be optimized, and I tried a lot of things. I struggled several days, read up a lot on type hinting, on using Java Arrays in Clojure and I learned many many interesting things related to raw performance (on OS X at least). I have to say my experience with coding in Clojure thus far has been fantastic. Most of the performance metrics were determined simply by having the program run and recompiling functions on the fly while dumping the timing information into the REPL. I could remove a single aget call and see how that affected performance without needed to stop the running program. So here are some observations (would love to know if I'm way off on anything): 1. Don't use multidimensional Java arrays. You get a access time hit for each dimension EVEN if you store the embedded array in a let binding. 2. aget is at least if not more expensive than a math operation. 3. Reuse Java arrays. Making new arrays each time through a tight loop is too demanding. Implications for multithreading. I would love to hear alternative ways to implement this algorithm as well if anyone has any ideas how to make it thread safe. 4. In Clojure functions are dirt cheap, blazingly fast, almost no overhead here as far as I can tell. Invisible almost. After all of these optimization, I finally cut the framerate by a little more than half, and now the code takes bout 41ms to update and render 150 boids: http://github.com/swannodette/clojure-stuff/blob/e81308902cba849efbbaf197b6945c86e89e00ad/flocking.clj Almost all the time is spent in the three functions, separate, cohesion, and align They all look pretty much like this: (defn separate [{loc :loc, :as boid} #^floats distsv #^floats xs #^floats ys] (let [dsep (float 25.0) len (int (dec *boid-count*)) [sum acount] (loop [i (int 0) rcount (int 0) result [0 0]] (if (> i len) [result rcount] (let [x (float (aget xs i)) y (float (aget ys i)) d (float (aget distsv i)) inhood (and (> d (float 0)) (< d dsep)) ncount (int (if inhood (unchecked-inc rcount) rcount)) nresult (if inhood (add result (div (unit (sub loc [x y])) d)) result)] (recur (unchecked-inc i) ncount nresult))))] (if (> acount (int 0)) (div sum acount) sum))) With a 150 boids, rendering consumes about 10ms (overhead from talking to proxied Java class?) separate, cohesion, and align together add an additional 30ms. I am pretty much out of ideas on how to optimize this further in pure Clojure. It seems to me that it is quite possible and I would love to hear from the Clojure experts. It seemed to me at one point that there was an implicit cost in using loop/recur (I went and removed most of the math in the loop/recurs and I didn't really see much in performance gains). I tried changing the loop/recur into a dotimes and the performance got worse. Running the original Processing version I can render about 700 boids and get about the same amount of calculation time as I do in pure Clojure. So the Processing version is still about 4.6X faster. Sorry for the long post, definitely interested in how I could close the gap. Of course I'm totally ready to accept the fact that this particular algorithm is best suited for having parts written in Java, but I really don't like Java and I love Clojure ;) --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~----------~----~----~----~------~----~------~--~---