On Feb 23, 12:51 am, "Edward Z. Yang" <ezy...@mit.edu> wrote: > I'd first like to state that I went into this exercise expecting > Bagwell's Hash Array Mapped Tries to blow the competition out of the > water; I'd get a fast functional map and give it to Haskell and people > would rejoice. > > Instead, I found a functional hash-map that was approximately four times > slower than Haskell's IntMap. I find this puzzling, and would love to > puzzle it out with you folks. > > First, the test case: > > (ns maptest (:gen-class)) > > (defn mk-random-stream [] > (let [r (new ec.util.MersenneTwisterFast)] > (repeatedly (fn [] (. r (nextInt)))))) > > (defn -main [sn] > (let [n (Integer/parseInt sn) > rs (take (* 2 n) (mk-random-stream))] > (time (let [m (apply hash-map (take (* n 2) (interleave rs rs)))] > (reduce > (fn [s k] (let [v (get m k)] > (+ s (if (nil? v) 0 v)))) > 0 > (take n (drop (/ n 2) rs))))))) > > We generate an n-sized hash tree, and then do n accesses, half of which > match and half of which miss. We compute the sum of all the values that > catch. We see the following characteristics: > > n Clojure Haskell (HAMT) Haskell (IntMap) > 32K .56s .30s .09s > 64K .84s .69s .22s > 128K 1.65s 1.50s .46s > 256K 2.94s 3.23s 1.17s > 512K 7.32s 6.96s 2.70s > > I was careful to compile the Clojure code before attempting to run it. > I also pre-allocate the entire random sequence to prevent it from > muddying the execution time (although I am using Sean Luke's excellent > fast Mersenne Twister [1]). The full string I used to invoke the > program was: > > /usr/lib/jvm/java-6-sun-1.6.0.16/bin/java -Dfile.encoding=UTF-8 \ > -classpath $CLASSES maptest $@ > > The fact that it's similar in performance to the Haskell > reimplementation (I'd say that the variance is not particularly > significant and both are about on par performance wise) seems to > validate the testing methodology, I hope, though the real proof of the > pudding would lie in implementing Haskell-style IntMap in Java. > > You can see the test code for the Haskell versions here: > > http://github.com/ezyang/hamt > > So... what's up with these numbers? Is my methodology wrong? Is the > garbage collector sucking? Is the algorithm just not as good as it > makes out to be? >
Yes, the methodology is wrong, at least on the Clojure/JVM side. First, benchmarking anything without -server is a waste of time. Second, the JVM does dynamic profile-driven compilation - performance on single cold runs is not indicative of optimal performance. Third, putting the random stream gen call in the let doesn't preallocate the sequence, nor move its execution there (and thus out of the timing), as Clojure's repeatedly/take etc are lazy. Fourth, it's possible to structure the code more like your Haskell IntMap code [1] (i.e. no interleave). Fifth, AOT compilation makes no difference in performance - Clojure code is always compiled. Try this: ;run with java -server -Xmx1024m (defn mk-random-stream [] (let [r (new ec.util.MersenneTwisterFast)] (repeatedly (fn [] (. r (nextInt)))))) (defn main [i] (let [vals (vec (take (* i 2) (mk-random-stream)))] (dotimes [_ 10] (time (let [m (apply hash-map (take i vals))] (reduce (fn [s k] (+ s (m k 0))) 0 (take i (drop (/ i 2) vals)))))))) (doseq [n (range 15 20)] (let [i (int (Math/pow 2 n))] (println " I = " i) (main i))) I = 32768 "Elapsed time: 199.499 msecs" "Elapsed time: 43.018 msecs" "Elapsed time: 24.715 msecs" "Elapsed time: 24.081 msecs" "Elapsed time: 39.707 msecs" "Elapsed time: 23.909 msecs" "Elapsed time: 25.918 msecs" "Elapsed time: 27.328 msecs" "Elapsed time: 23.797 msecs" "Elapsed time: 23.874 msecs" I = 65536 "Elapsed time: 66.411 msecs" "Elapsed time: 298.676 msecs" "Elapsed time: 58.821 msecs" "Elapsed time: 66.621 msecs" "Elapsed time: 60.428 msecs" "Elapsed time: 56.664 msecs" "Elapsed time: 67.778 msecs" "Elapsed time: 56.833 msecs" "Elapsed time: 58.613 msecs" "Elapsed time: 60.37 msecs" I = 131072 "Elapsed time: 118.257 msecs" "Elapsed time: 173.605 msecs" "Elapsed time: 121.663 msecs" "Elapsed time: 166.162 msecs" "Elapsed time: 118.889 msecs" "Elapsed time: 158.164 msecs" "Elapsed time: 120.545 msecs" "Elapsed time: 157.452 msecs" "Elapsed time: 120.446 msecs" "Elapsed time: 154.614 msecs" I = 262144 "Elapsed time: 390.623 msecs" "Elapsed time: 247.928 msecs" "Elapsed time: 336.522 msecs" "Elapsed time: 361.865 msecs" "Elapsed time: 250.092 msecs" "Elapsed time: 324.004 msecs" "Elapsed time: 619.761 msecs" "Elapsed time: 250.708 msecs" "Elapsed time: 253.414 msecs" "Elapsed time: 247.428 msecs" I = 524288 "Elapsed time: 752.528 msecs" "Elapsed time: 1254.839 msecs" "Elapsed time: 1126.026 msecs" "Elapsed time: 589.351 msecs" "Elapsed time: 532.753 msecs" "Elapsed time: 532.615 msecs" "Elapsed time: 554.859 msecs" "Elapsed time: 529.93 msecs" "Elapsed time: 525.966 msecs" "Elapsed time: 551.065 msecs" Note as HotSpot kicks in to optimize the code. Note also the near-linearity of the times with increase in N, indicating the promised near-constant-time performance (unlike the IntMap times). Regards, Rich [1] http://github.com/ezyang/hamt/blob/master/IntMapTest.hs -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en