Thanks to the work and thought of Mark Engelberg, Alex Miller, Rich Hickey,
myself, and likely several others who I should be naming here but am
forgetting, the latest (not yet released) Clojure master version has an
improved hash function that gives a much better variety of hash values, and
thus much improved performance for hash-based collections like hash maps
and hash sets, than previous versions of Clojure.

For example, the N-queen program that Paul Butcher wrote and was asking
about its performance at the beginning of this thread, improves from 6.7
minutes before these hash changes to 12.8 seconds after the changes, when
run on the 6x6 board.  For the 6x9 board, the improvement is from
"something over 8 hours that I didn't measure precisely because I didn't
want to wait" down to 12.8 minutes.  A few more experimental results are
given on the design wiki page [1].

[1] http://dev.clojure.org/display/design/Better+hashing

Andy


On Sat, Nov 2, 2013 at 9:44 AM, Andy Fingerhut <andy.finger...@gmail.com>wrote:

> A few minutes ago I finished copying, pasting, and doing a little
> reformatting on Mark Engelberg's document on the subject.
>
>     http://dev.clojure.org/display/design/Better+hashing
>
> Andy
>
>
> On Fri, Nov 1, 2013 at 11:28 AM, Alex Miller <a...@puredanger.com> wrote:
>
>> Has anyone created a design page on dev.clojure for this yet?
>>
>>
>> On Thursday, October 31, 2013 9:19:23 AM UTC-5, Andy Fingerhut wrote:
>>
>>> Just a follow up after a fair amount of thinking and experimentation has
>>> been done on this problem.
>>>
>>> Mark Engelberg has a very nice document describing the existing hash
>>> functions used by Clojure, examples with why they can cause collisions so
>>> often, and suggestions for changes to them to have fewer collisions in
>>> these cases.  https://docs.google.com/document/d/
>>> 10olJzjNHXn1Si1LsSvjNqF_X1O7OTPZLuv3Uo_duY5o/edit
>>>
>>> I have made modifications to a fork of the Clojure code base with his
>>> suggested hash modifications here, for experimentation purposes:
>>> https://github.com/jafingerhut/clojure   I also have some modifications
>>> to Paul's N-queens Clojure program to print out additional statistics and
>>> try out different hash functions here: https://github.com/
>>> jafingerhut/chess-clojure
>>>
>>> Running Paul's N-queens problem with a 6x6 board without any changes to
>>> Clojure takes about 7 mins on one of my computers.  With those changes to
>>> the hash functions it finishes in about 12 sec.
>>>
>>> I haven't let a 6x9 board run to completion without those hash changes,
>>> but it takes a long time :-)  With those changes to the hash functions it
>>> finishes in about 11 minutes.
>>>
>>> The reason for the currently slow behavior is a combination of the
>>> current hash functions and the implementation of the PersistentHashSet data
>>> structure used to implement sets.  A PersistentHashSet is, very roughly, a
>>> 32-way branching tree with the hash function of the elements used to decide
>>> which branch to take.  When you hit a leaf in the tree, all elements with
>>> the same hash function value are put into an array that is scanned linearly
>>> whenever you want to check whether a value is already in the set.  So the
>>> more hash collisions, the more it is like using an array to store and look
>>> up set elements in O(n) time.
>>>
>>> With the 6x9 board, there are 20,136,752 solutions.  The way those
>>> solutions are represented in the program as sets of vectors, those 20
>>> million values only have 17,936 different hash values.  Thus the average
>>> length of those arrays of values in the tree leaves is 1,122 values.  The
>>> longest of those arrays has 81,610 values in it, all with the same hash
>>> function.
>>>
>>> With Mark's proposed hash function changes, those 20,136,752 values hash
>>> to 20,089,488 different ints, for an average array length of 1 value, and a
>>> longest array length of 3 values.  Big speedup.
>>>
>>> There is nothing unique about Mark's particular hash functions that
>>> achieves this.  Other hash modifications can give similar improvements.
>>> The Clojure dev team is considering which changes would be good ones to
>>> make, in hopes of changing them once and not having to do it again.
>>>
>>> Andy
>>>
>>>
>>> On Tue, Oct 22, 2013 at 9:28 AM, Paul Butcher <pa...@paulbutcher.com>wrote:
>>>
>>>> I've been playing around with a generalised version of the N-Queens
>>>> problem that handles other chess pieces. I have a Scala implementation that
>>>> uses an eager depth-first search. Given enough RAM (around 12G - it's very
>>>> memory hungry!) it can solve a 6x9 board with 6 pieces in around 2.5
>>>> minutes on my MacBook Pro.
>>>>
>>>> I then created a Clojure version of the same algorithm with the
>>>> intention of (eventually) using laziness to avoid the memory issue.
>>>> However, the performance of the Clojure version is several orders of
>>>> magnitude worse than the Scala version (I've left it running overnight on
>>>> the problem that the Scala version solves in 2.5 minutes and it still
>>>> hasn't completed).
>>>>
>>>> I'm struggling to see why the performance of the Clojure version is so
>>>> much worse than the Scala. Profiling in YourKit suggests that it's spending
>>>> 99% of its time in PersistentHashSet.cons, but I'm at a loss to explain
>>>> why. I would be grateful for any insight.
>>>>
>>>> The code for the two different versions is on GitHub:
>>>>
>>>> https://github.com/paulbutcher/chess-scala
>>>>  https://github.com/paulbutcher/chess-clojure
>>>>
>>>> Notes:
>>>>
>>>> - I know that an exhaustive depth-first search isn't a great way to
>>>> tackle this problem. But I'd like to understand why I see such a dramatic
>>>> difference between the performance of the Scala and Clojure versions.
>>>>
>>>> - I believe that I've implemented the same algorithm in both Scala and
>>>> Clojure - certainly they both generate the same results for small problems
>>>> (there are 3x3 and 4x4 problems commented out in the source). But clearly I
>>>> can't rule out the possibility that I've made a mistake in the Clojure
>>>> implementation. If anyone can spot my stupid mistake, I'd greatly
>>>> appreciate it.
>>>>
>>>> Thanks in advance for any insight that anyone can offer.
>>>>
>>>> --
>>>> paul.butcher->msgCount++
>>>>
>>>> Snetterton, Castle Combe, Cadwell Park...
>>>> Who says I have a one track mind?
>>>>
>>>> http://www.paulbutcher.com/
>>>> LinkedIn: http://www.linkedin.com/in/paulbutcher
>>>> MSN: pa...@paulbutcher.com
>>>> AIM: paulrabutcher
>>>> Skype: paulrabutcher
>>>>
>>>>  --
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Clojure" group.
>>>> To post to this group, send email to clo...@googlegroups.com
>>>>
>>>> Note that posts from new members are moderated - please be patient with
>>>> your first post.
>>>> To unsubscribe from this group, send email to
>>>> clojure+u...@googlegroups.com
>>>>
>>>> For more options, visit this group at
>>>> http://groups.google.com/group/clojure?hl=en
>>>> ---
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Clojure" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to clojure+u...@googlegroups.com.
>>>>
>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>
>>>
>>>  --
>> --
>> You received this message because you are subscribed to the Google
>> Groups "Clojure" group.
>> To post to this group, send email to clojure@googlegroups.com
>> Note that posts from new members are moderated - please be patient with
>> your first post.
>> To unsubscribe from this group, send email to
>> clojure+unsubscr...@googlegroups.com
>> For more options, visit this group at
>> http://groups.google.com/group/clojure?hl=en
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "Clojure" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to clojure+unsubscr...@googlegroups.com.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to