A few minutes ago I finished copying, pasting, and doing a little
reformatting on Mark Engelberg's document on the subject.

    http://dev.clojure.org/display/design/Better+hashing

Andy


On Fri, Nov 1, 2013 at 11:28 AM, Alex Miller <a...@puredanger.com> wrote:

> Has anyone created a design page on dev.clojure for this yet?
>
>
> On Thursday, October 31, 2013 9:19:23 AM UTC-5, Andy Fingerhut wrote:
>
>> Just a follow up after a fair amount of thinking and experimentation has
>> been done on this problem.
>>
>> Mark Engelberg has a very nice document describing the existing hash
>> functions used by Clojure, examples with why they can cause collisions so
>> often, and suggestions for changes to them to have fewer collisions in
>> these cases.  https://docs.google.com/document/d/10olJzjNHXn1Si1LsSvjNqF_
>> X1O7OTPZLuv3Uo_duY5o/edit
>>
>> I have made modifications to a fork of the Clojure code base with his
>> suggested hash modifications here, for experimentation purposes:
>> https://github.com/jafingerhut/clojure   I also have some modifications
>> to Paul's N-queens Clojure program to print out additional statistics and
>> try out different hash functions here: https://github.com/
>> jafingerhut/chess-clojure
>>
>> Running Paul's N-queens problem with a 6x6 board without any changes to
>> Clojure takes about 7 mins on one of my computers.  With those changes to
>> the hash functions it finishes in about 12 sec.
>>
>> I haven't let a 6x9 board run to completion without those hash changes,
>> but it takes a long time :-)  With those changes to the hash functions it
>> finishes in about 11 minutes.
>>
>> The reason for the currently slow behavior is a combination of the
>> current hash functions and the implementation of the PersistentHashSet data
>> structure used to implement sets.  A PersistentHashSet is, very roughly, a
>> 32-way branching tree with the hash function of the elements used to decide
>> which branch to take.  When you hit a leaf in the tree, all elements with
>> the same hash function value are put into an array that is scanned linearly
>> whenever you want to check whether a value is already in the set.  So the
>> more hash collisions, the more it is like using an array to store and look
>> up set elements in O(n) time.
>>
>> With the 6x9 board, there are 20,136,752 solutions.  The way those
>> solutions are represented in the program as sets of vectors, those 20
>> million values only have 17,936 different hash values.  Thus the average
>> length of those arrays of values in the tree leaves is 1,122 values.  The
>> longest of those arrays has 81,610 values in it, all with the same hash
>> function.
>>
>> With Mark's proposed hash function changes, those 20,136,752 values hash
>> to 20,089,488 different ints, for an average array length of 1 value, and a
>> longest array length of 3 values.  Big speedup.
>>
>> There is nothing unique about Mark's particular hash functions that
>> achieves this.  Other hash modifications can give similar improvements.
>> The Clojure dev team is considering which changes would be good ones to
>> make, in hopes of changing them once and not having to do it again.
>>
>> Andy
>>
>>
>> On Tue, Oct 22, 2013 at 9:28 AM, Paul Butcher <pa...@paulbutcher.com>wrote:
>>
>>> I've been playing around with a generalised version of the N-Queens
>>> problem that handles other chess pieces. I have a Scala implementation that
>>> uses an eager depth-first search. Given enough RAM (around 12G - it's very
>>> memory hungry!) it can solve a 6x9 board with 6 pieces in around 2.5
>>> minutes on my MacBook Pro.
>>>
>>> I then created a Clojure version of the same algorithm with the
>>> intention of (eventually) using laziness to avoid the memory issue.
>>> However, the performance of the Clojure version is several orders of
>>> magnitude worse than the Scala version (I've left it running overnight on
>>> the problem that the Scala version solves in 2.5 minutes and it still
>>> hasn't completed).
>>>
>>> I'm struggling to see why the performance of the Clojure version is so
>>> much worse than the Scala. Profiling in YourKit suggests that it's spending
>>> 99% of its time in PersistentHashSet.cons, but I'm at a loss to explain
>>> why. I would be grateful for any insight.
>>>
>>> The code for the two different versions is on GitHub:
>>>
>>> https://github.com/paulbutcher/chess-scala
>>> https://github.com/paulbutcher/chess-clojure
>>>
>>> Notes:
>>>
>>> - I know that an exhaustive depth-first search isn't a great way to
>>> tackle this problem. But I'd like to understand why I see such a dramatic
>>> difference between the performance of the Scala and Clojure versions.
>>>
>>> - I believe that I've implemented the same algorithm in both Scala and
>>> Clojure - certainly they both generate the same results for small problems
>>> (there are 3x3 and 4x4 problems commented out in the source). But clearly I
>>> can't rule out the possibility that I've made a mistake in the Clojure
>>> implementation. If anyone can spot my stupid mistake, I'd greatly
>>> appreciate it.
>>>
>>> Thanks in advance for any insight that anyone can offer.
>>>
>>> --
>>> paul.butcher->msgCount++
>>>
>>> Snetterton, Castle Combe, Cadwell Park...
>>> Who says I have a one track mind?
>>>
>>> http://www.paulbutcher.com/
>>> LinkedIn: http://www.linkedin.com/in/paulbutcher
>>> MSN: pa...@paulbutcher.com
>>> AIM: paulrabutcher
>>> Skype: paulrabutcher
>>>
>>>  --
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Clojure" group.
>>> To post to this group, send email to clo...@googlegroups.com
>>>
>>> Note that posts from new members are moderated - please be patient with
>>> your first post.
>>> To unsubscribe from this group, send email to
>>> clojure+u...@googlegroups.com
>>>
>>> For more options, visit this group at
>>> http://groups.google.com/group/clojure?hl=en
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups "Clojure" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to clojure+u...@googlegroups.com.
>>>
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>
>>  --
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to clojure+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to