+1

On Thu, Jan 30, 2014 at 11:13 AM, Andy Fingerhut
<andy.finger...@gmail.com>wrote:

> Thanks to the work and thought of Mark Engelberg, Alex Miller, Rich
> Hickey, myself, and likely several others who I should be naming here but
> am forgetting, the latest (not yet released) Clojure master version has an
> improved hash function that gives a much better variety of hash values, and
> thus much improved performance for hash-based collections like hash maps
> and hash sets, than previous versions of Clojure.
>
> For example, the N-queen program that Paul Butcher wrote and was asking
> about its performance at the beginning of this thread, improves from 6.7
> minutes before these hash changes to 12.8 seconds after the changes, when
> run on the 6x6 board.  For the 6x9 board, the improvement is from
> "something over 8 hours that I didn't measure precisely because I didn't
> want to wait" down to 12.8 minutes.  A few more experimental results are
> given on the design wiki page [1].
>
> [1] http://dev.clojure.org/display/design/Better+hashing
>
> Andy
>
>
> On Sat, Nov 2, 2013 at 9:44 AM, Andy Fingerhut 
> <andy.finger...@gmail.com>wrote:
>
>> A few minutes ago I finished copying, pasting, and doing a little
>> reformatting on Mark Engelberg's document on the subject.
>>
>>     http://dev.clojure.org/display/design/Better+hashing
>>
>> Andy
>>
>>
>> On Fri, Nov 1, 2013 at 11:28 AM, Alex Miller <a...@puredanger.com> wrote:
>>
>>> Has anyone created a design page on dev.clojure for this yet?
>>>
>>>
>>> On Thursday, October 31, 2013 9:19:23 AM UTC-5, Andy Fingerhut wrote:
>>>
>>>> Just a follow up after a fair amount of thinking and experimentation
>>>> has been done on this problem.
>>>>
>>>> Mark Engelberg has a very nice document describing the existing hash
>>>> functions used by Clojure, examples with why they can cause collisions so
>>>> often, and suggestions for changes to them to have fewer collisions in
>>>> these cases.  https://docs.google.com/document/d/
>>>> 10olJzjNHXn1Si1LsSvjNqF_X1O7OTPZLuv3Uo_duY5o/edit
>>>>
>>>> I have made modifications to a fork of the Clojure code base with his
>>>> suggested hash modifications here, for experimentation purposes:
>>>> https://github.com/jafingerhut/clojure   I also have some
>>>> modifications to Paul's N-queens Clojure program to print out additional
>>>> statistics and try out different hash functions here:
>>>> https://github.com/jafingerhut/chess-clojure
>>>>
>>>> Running Paul's N-queens problem with a 6x6 board without any changes to
>>>> Clojure takes about 7 mins on one of my computers.  With those changes to
>>>> the hash functions it finishes in about 12 sec.
>>>>
>>>> I haven't let a 6x9 board run to completion without those hash changes,
>>>> but it takes a long time :-)  With those changes to the hash functions it
>>>> finishes in about 11 minutes.
>>>>
>>>> The reason for the currently slow behavior is a combination of the
>>>> current hash functions and the implementation of the PersistentHashSet data
>>>> structure used to implement sets.  A PersistentHashSet is, very roughly, a
>>>> 32-way branching tree with the hash function of the elements used to decide
>>>> which branch to take.  When you hit a leaf in the tree, all elements with
>>>> the same hash function value are put into an array that is scanned linearly
>>>> whenever you want to check whether a value is already in the set.  So the
>>>> more hash collisions, the more it is like using an array to store and look
>>>> up set elements in O(n) time.
>>>>
>>>> With the 6x9 board, there are 20,136,752 solutions.  The way those
>>>> solutions are represented in the program as sets of vectors, those 20
>>>> million values only have 17,936 different hash values.  Thus the average
>>>> length of those arrays of values in the tree leaves is 1,122 values.  The
>>>> longest of those arrays has 81,610 values in it, all with the same hash
>>>> function.
>>>>
>>>> With Mark's proposed hash function changes, those 20,136,752 values
>>>> hash to 20,089,488 different ints, for an average array length of 1 value,
>>>> and a longest array length of 3 values.  Big speedup.
>>>>
>>>> There is nothing unique about Mark's particular hash functions that
>>>> achieves this.  Other hash modifications can give similar improvements.
>>>> The Clojure dev team is considering which changes would be good ones to
>>>> make, in hopes of changing them once and not having to do it again.
>>>>
>>>> Andy
>>>>
>>>>
>>>> On Tue, Oct 22, 2013 at 9:28 AM, Paul Butcher <pa...@paulbutcher.com>wrote:
>>>>
>>>>> I've been playing around with a generalised version of the N-Queens
>>>>> problem that handles other chess pieces. I have a Scala implementation 
>>>>> that
>>>>> uses an eager depth-first search. Given enough RAM (around 12G - it's very
>>>>> memory hungry!) it can solve a 6x9 board with 6 pieces in around 2.5
>>>>> minutes on my MacBook Pro.
>>>>>
>>>>> I then created a Clojure version of the same algorithm with the
>>>>> intention of (eventually) using laziness to avoid the memory issue.
>>>>> However, the performance of the Clojure version is several orders of
>>>>> magnitude worse than the Scala version (I've left it running overnight on
>>>>> the problem that the Scala version solves in 2.5 minutes and it still
>>>>> hasn't completed).
>>>>>
>>>>> I'm struggling to see why the performance of the Clojure version is so
>>>>> much worse than the Scala. Profiling in YourKit suggests that it's 
>>>>> spending
>>>>> 99% of its time in PersistentHashSet.cons, but I'm at a loss to explain
>>>>> why. I would be grateful for any insight.
>>>>>
>>>>> The code for the two different versions is on GitHub:
>>>>>
>>>>> https://github.com/paulbutcher/chess-scala
>>>>>  https://github.com/paulbutcher/chess-clojure
>>>>>
>>>>> Notes:
>>>>>
>>>>> - I know that an exhaustive depth-first search isn't a great way to
>>>>> tackle this problem. But I'd like to understand why I see such a dramatic
>>>>> difference between the performance of the Scala and Clojure versions.
>>>>>
>>>>> - I believe that I've implemented the same algorithm in both Scala and
>>>>> Clojure - certainly they both generate the same results for small problems
>>>>> (there are 3x3 and 4x4 problems commented out in the source). But clearly 
>>>>> I
>>>>> can't rule out the possibility that I've made a mistake in the Clojure
>>>>> implementation. If anyone can spot my stupid mistake, I'd greatly
>>>>> appreciate it.
>>>>>
>>>>> Thanks in advance for any insight that anyone can offer.
>>>>>
>>>>> --
>>>>> paul.butcher->msgCount++
>>>>>
>>>>> Snetterton, Castle Combe, Cadwell Park...
>>>>> Who says I have a one track mind?
>>>>>
>>>>> http://www.paulbutcher.com/
>>>>> LinkedIn: http://www.linkedin.com/in/paulbutcher
>>>>> MSN: pa...@paulbutcher.com
>>>>> AIM: paulrabutcher
>>>>> Skype: paulrabutcher
>>>>>
>>>>>  --
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "Clojure" group.
>>>>> To post to this group, send email to clo...@googlegroups.com
>>>>>
>>>>> Note that posts from new members are moderated - please be patient
>>>>> with your first post.
>>>>> To unsubscribe from this group, send email to
>>>>> clojure+u...@googlegroups.com
>>>>>
>>>>> For more options, visit this group at
>>>>> http://groups.google.com/group/clojure?hl=en
>>>>> ---
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "Clojure" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to clojure+u...@googlegroups.com.
>>>>>
>>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>>
>>>>
>>>>  --
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Clojure" group.
>>> To post to this group, send email to clojure@googlegroups.com
>>> Note that posts from new members are moderated - please be patient with
>>> your first post.
>>> To unsubscribe from this group, send email to
>>> clojure+unsubscr...@googlegroups.com
>>> For more options, visit this group at
>>> http://groups.google.com/group/clojure?hl=en
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups "Clojure" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to clojure+unsubscr...@googlegroups.com.
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>
>>
>  --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to clojure+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to