A few minutes ago I finished copying, pasting, and doing a little reformatting on Mark Engelberg's document on the subject.
http://dev.clojure.org/display/design/Better+hashing Andy On Fri, Nov 1, 2013 at 11:28 AM, Alex Miller <a...@puredanger.com> wrote: > Has anyone created a design page on dev.clojure for this yet? > > > On Thursday, October 31, 2013 9:19:23 AM UTC-5, Andy Fingerhut wrote: > >> Just a follow up after a fair amount of thinking and experimentation has >> been done on this problem. >> >> Mark Engelberg has a very nice document describing the existing hash >> functions used by Clojure, examples with why they can cause collisions so >> often, and suggestions for changes to them to have fewer collisions in >> these cases. https://docs.google.com/document/d/10olJzjNHXn1Si1LsSvjNqF_ >> X1O7OTPZLuv3Uo_duY5o/edit >> >> I have made modifications to a fork of the Clojure code base with his >> suggested hash modifications here, for experimentation purposes: >> https://github.com/jafingerhut/clojure I also have some modifications >> to Paul's N-queens Clojure program to print out additional statistics and >> try out different hash functions here: https://github.com/ >> jafingerhut/chess-clojure >> >> Running Paul's N-queens problem with a 6x6 board without any changes to >> Clojure takes about 7 mins on one of my computers. With those changes to >> the hash functions it finishes in about 12 sec. >> >> I haven't let a 6x9 board run to completion without those hash changes, >> but it takes a long time :-) With those changes to the hash functions it >> finishes in about 11 minutes. >> >> The reason for the currently slow behavior is a combination of the >> current hash functions and the implementation of the PersistentHashSet data >> structure used to implement sets. A PersistentHashSet is, very roughly, a >> 32-way branching tree with the hash function of the elements used to decide >> which branch to take. When you hit a leaf in the tree, all elements with >> the same hash function value are put into an array that is scanned linearly >> whenever you want to check whether a value is already in the set. So the >> more hash collisions, the more it is like using an array to store and look >> up set elements in O(n) time. >> >> With the 6x9 board, there are 20,136,752 solutions. The way those >> solutions are represented in the program as sets of vectors, those 20 >> million values only have 17,936 different hash values. Thus the average >> length of those arrays of values in the tree leaves is 1,122 values. The >> longest of those arrays has 81,610 values in it, all with the same hash >> function. >> >> With Mark's proposed hash function changes, those 20,136,752 values hash >> to 20,089,488 different ints, for an average array length of 1 value, and a >> longest array length of 3 values. Big speedup. >> >> There is nothing unique about Mark's particular hash functions that >> achieves this. Other hash modifications can give similar improvements. >> The Clojure dev team is considering which changes would be good ones to >> make, in hopes of changing them once and not having to do it again. >> >> Andy >> >> >> On Tue, Oct 22, 2013 at 9:28 AM, Paul Butcher <pa...@paulbutcher.com>wrote: >> >>> I've been playing around with a generalised version of the N-Queens >>> problem that handles other chess pieces. I have a Scala implementation that >>> uses an eager depth-first search. Given enough RAM (around 12G - it's very >>> memory hungry!) it can solve a 6x9 board with 6 pieces in around 2.5 >>> minutes on my MacBook Pro. >>> >>> I then created a Clojure version of the same algorithm with the >>> intention of (eventually) using laziness to avoid the memory issue. >>> However, the performance of the Clojure version is several orders of >>> magnitude worse than the Scala version (I've left it running overnight on >>> the problem that the Scala version solves in 2.5 minutes and it still >>> hasn't completed). >>> >>> I'm struggling to see why the performance of the Clojure version is so >>> much worse than the Scala. Profiling in YourKit suggests that it's spending >>> 99% of its time in PersistentHashSet.cons, but I'm at a loss to explain >>> why. I would be grateful for any insight. >>> >>> The code for the two different versions is on GitHub: >>> >>> https://github.com/paulbutcher/chess-scala >>> https://github.com/paulbutcher/chess-clojure >>> >>> Notes: >>> >>> - I know that an exhaustive depth-first search isn't a great way to >>> tackle this problem. But I'd like to understand why I see such a dramatic >>> difference between the performance of the Scala and Clojure versions. >>> >>> - I believe that I've implemented the same algorithm in both Scala and >>> Clojure - certainly they both generate the same results for small problems >>> (there are 3x3 and 4x4 problems commented out in the source). But clearly I >>> can't rule out the possibility that I've made a mistake in the Clojure >>> implementation. If anyone can spot my stupid mistake, I'd greatly >>> appreciate it. >>> >>> Thanks in advance for any insight that anyone can offer. >>> >>> -- >>> paul.butcher->msgCount++ >>> >>> Snetterton, Castle Combe, Cadwell Park... >>> Who says I have a one track mind? >>> >>> http://www.paulbutcher.com/ >>> LinkedIn: http://www.linkedin.com/in/paulbutcher >>> MSN: pa...@paulbutcher.com >>> AIM: paulrabutcher >>> Skype: paulrabutcher >>> >>> -- >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Clojure" group. >>> To post to this group, send email to clo...@googlegroups.com >>> >>> Note that posts from new members are moderated - please be patient with >>> your first post. >>> To unsubscribe from this group, send email to >>> clojure+u...@googlegroups.com >>> >>> For more options, visit this group at >>> http://groups.google.com/group/clojure?hl=en >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "Clojure" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to clojure+u...@googlegroups.com. >>> >>> For more options, visit https://groups.google.com/groups/opt_out. >>> >> >> -- > -- > You received this message because you are subscribed to the Google > Groups "Clojure" group. > To post to this group, send email to clojure@googlegroups.com > Note that posts from new members are moderated - please be patient with > your first post. > To unsubscribe from this group, send email to > clojure+unsubscr...@googlegroups.com > For more options, visit this group at > http://groups.google.com/group/clojure?hl=en > --- > You received this message because you are subscribed to the Google Groups > "Clojure" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to clojure+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. > -- -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.