Hello all, I spent this weekend attempting to impement Phil Bagwell's Hash Array Mapped Tries in Haskell. You can view the fruits of this work at this repository:
http://github.com/ezyang/hamt Unfortunately, I'm getting pretty abysmal performance out of the implementation I wrote, so I'm hoping to get some performance tuning insights here. My cost centers look about like this: time% alloc% i-Full-update HAMT 29.1 31.4 lookup' HAMT 13.7 4.3 main Main 12.8 19.7 subkey HAMT 11.5 10.0 i-Bitmap HAMT 9.4 12.0 i-Full HAMT 8.1 12.9 insert' HAMT 5.1 0.6 popCount PopCount 4.7 1.0 And my GC stats look like: ./HAMTTest 256000 +RTS -t 275576630962922 <<ghc: 410718260 bytes, 788 GCs, 19646424/46880276 avg/max bytes residency (9 samples), 138M in use, 0.00 INIT (0.00 elapsed), 1.14 MUT (1.13 elapsed), 1.88 GC (2.00 elapsed) :ghc>> + 0:03.16 ./IntMapTest 256000 +RTS -t -710684043813 <<ghc: 172126372 bytes, 329 GCs, 5552955/10483896 avg/max bytes residency (9 samples), 31M in use, 0.00 INIT (0.00 elapsed), 0.51 MUT (0.51 elapsed), 0.62 GC (0.67 elapsed) :ghc>> + 0:01.18 IntMap is included for comparison. The HAMT does about x3-5 worse than IntMap, and uses about x4 more memory (and gets correspondingly penalized when garbage collection occurs). My observations/suspicions are as such: * i-Full-update essentially copies a 32-size vector, with a change to one element. I think I am getting brutally punished for this, in terms of both memory usage as well as runtime. What I'm curious is whether or not this is intrinsic to the algorithm, or if it's something special that GHC is doing. * Bagwell suggests that each node in the Array-Mapped Trie should take only two words. I think I'm losing another two words to storing bounds information in vector, as well as the costs of boxing (though I can't tell if GHC is boxing or not). I've done some playing around with the data declaration, but it's current form seems to result in the fastest implementation. I wonder if this is what is causing the x4 ballooning of memory, or if it's something else. I haven't tried rewriting the code to use GHC's arrays. * Increasing the heap size seems to generally improve the performance the HAMT; a 1G heap on a 512K sample set reduces the difference to about x1.5-2 slower. It's hard to tell if the memory allocation system is working for or against me. What bothers me is even if I could magically wave away all GC time in HAMT, it would only barely win against IntMap. Maybe the algorithm is simply not as good as a good ole' Patricia Trie; but then again, maybe not. Comments and suggestions greatly appreciated. Thanks, Edward _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe