Re: Dubious performance characteristics of hash-maps

Mark Tomko Tue, 23 Feb 2010 08:43:51 -0800

I'd suggest allocating more heap than the default JVM settings, or
running the JVM in server mode.  I'm not sure what Haskell's
underlying implementation is, but I'd assume that it can grow its heap
(somewhat) arbitrarily, which the JVM cannot, beyond a fixed bound.


On Feb 23, 12:51 am, "Edward Z. Yang" <ezy...@mit.edu> wrote:
> I'd first like to state that I went into this exercise expecting
> Bagwell's Hash Array Mapped Tries to blow the competition out of the
> water; I'd get a fast functional map and give it to Haskell and people
> would rejoice.
>
> Instead, I found a functional hash-map that was approximately four times
> slower than Haskell's IntMap.  I find this puzzling, and would love to
> puzzle it out with you folks.
>
> First, the test case:
>
>     (ns maptest (:gen-class))
>
>     (defn mk-random-stream []
>         (let [r (new ec.util.MersenneTwisterFast)]
>                  (repeatedly (fn [] (. r (nextInt))))))
>
>     (defn -main [sn]
>       (let [n  (Integer/parseInt sn)
>                   rs (take (* 2 n) (mk-random-stream))]
>        (time (let [m  (apply hash-map (take (* n 2) (interleave rs rs)))]
>             (reduce
>                     (fn [s k] (let [v (get m k)]
>                                             (+ s (if (nil? v) 0 v))))
>                     0
>                     (take n (drop (/ n 2) rs)))))))
>
> We generate an n-sized hash tree, and then do n accesses, half of which
> match and half of which miss.  We compute the sum of all the values that
> catch.  We see the following characteristics:
>
>        n   Clojure   Haskell (HAMT)   Haskell (IntMap)
>      32K     .56s            .30s           .09s
>      64K     .84s            .69s           .22s
>     128K    1.65s           1.50s           .46s
>     256K    2.94s           3.23s          1.17s
>     512K    7.32s           6.96s          2.70s
>
> I was careful to compile the Clojure code before attempting to run it.
> I also pre-allocate the entire random sequence to prevent it from
> muddying the execution time (although I am using Sean Luke's excellent
> fast Mersenne Twister [1]).  The full string I used to invoke the
> program was:
>
>     /usr/lib/jvm/java-6-sun-1.6.0.16/bin/java -Dfile.encoding=UTF-8 \
>         -classpath $CLASSES maptest $@
>
> The fact that it's similar in performance to the Haskell
> reimplementation (I'd say that the variance is not particularly
> significant and both are about on par performance wise) seems to
> validate the testing methodology, I hope, though the real proof of the
> pudding would lie in implementing Haskell-style IntMap in Java.
>
> You can see the test code for the Haskell versions here:
>
>    http://github.com/ezyang/hamt
>
> So... what's up with these numbers?  Is my methodology wrong?  Is the
> garbage collector sucking?  Is the algorithm just not as good as it
> makes out to be?
>
> Cheers,
> Edward
>
> [1]http://www.cs.gmu.edu/~sean/research/

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: Dubious performance characteristics of hash-maps

Reply via email to