Mark, I think you have hit the nail on the head.  I have instrumented a
copy of Paul's Clojure program to print the hash code of all of the
solutions in the set returned by solve, and there are *many* pairs of
solutions that have identical hash values, and thus collide in a
PersistentHashSet.  Below is an excerpt from that output:

    97392280 = (hash #{[:N [2 0]] [:R [4 2]] [:K [0 0]] [:Q [1 3]] [:B [3
0]]})
    97392280 = (hash #{[:N [3 0]] [:K [0 0]] [:R [1 2]] [:Q [4 3]] [:B [2
0]]})
    97392280 = (hash #{[:N [3 0]] [:K [0 0]] [:R [2 1]] [:Q [1 4]] [:B [4
0]]})
    97392280 = (hash #{[:N [4 0]] [:K [0 0]] [:R [3 3]] [:Q [1 2]] [:B [2
0]]})
    97392280 = (hash #{[:N [3 0]] [:K [1 0]] [:R [0 2]] [:Q [2 3]] [:B [4
0]]})
    97392280 = (hash #{[:N [3 0]] [:K [1 0]] [:R [2 3]] [:Q [0 2]] [:B [4
0]]})
    97392280 = (hash #{[:N [4 0]] [:K [1 0]] [:R [0 2]] [:Q [2 3]] [:B [3
0]]})
    97392280 = (hash #{[:N [4 0]] [:K [1 0]] [:R [2 3]] [:Q [0 2]] [:B [3
0]]})
    97392280 = (hash #{[:K [3 0]] [:N [0 0]] [:Q [4 2]] [:R [2 3]] [:B [1
0]]})
    97392280 = (hash #{[:K [3 0]] [:N [0 0]] [:R [4 2]] [:Q [2 3]] [:B [1
0]]})
    97392280 = (hash #{[:N [1 0]] [:K [3 0]] [:Q [4 2]] [:R [2 3]] [:B [0
0]]})
    97392280 = (hash #{[:N [1 0]] [:K [3 0]] [:R [4 2]] [:Q [2 3]] [:B [0
0]]})
    97392280 = (hash #{[:K [4 0]] [:N [0 0]] [:Q [3 2]] [:R [1 3]] [:B [2
0]]})
    97392280 = (hash #{[:K [4 0]] [:N [1 0]] [:R [3 2]] [:Q [0 3]] [:B [2
0]]})
    97392280 = (hash #{[:K [4 0]] [:N [1 0]] [:R [2 1]] [:Q [3 4]] [:B [0
0]]})
    97392280 = (hash #{[:N [2 0]] [:K [4 0]] [:R [0 2]] [:Q [3 3]] [:B [1
0]]})

The hash value of a set is the sum of the hashes of its elements, because
sets are not ordered.  The hash of a vector is dependent on the order of
the elements.  In fact, for 2-element vectors it is 31*(hash of 1st elem) +
(hash of 2nd elem).  The hash value of a small integer is just the integer
value itself.

Thus with many small integers in 2-element vectors like in this problem,
all it takes is having a set of vectors where the first elements sum to the
same value, and the second elements sum to the same value, and the hash of
the whole set of vectors will be equal.

Exactly what would be the best way to improve the hash code for situations
like this, I am not sure yet.

Andy


On Wed, Oct 23, 2013 at 8:46 AM, Mark Engelberg <mark.engelb...@gmail.com>wrote:

> For those of you playing along at home, I also ran Paul's chess program in
> YourKit with the sampling profiler.  Here are all the calls that appear in
> the "HotSpots" view, along with time spent in those methods.
>
> java.io.PushbackInputStream.read() 422977
> clojure.lang.PersistentHashMap$HashCollisionNode.findIndex(Object) 206687
> clojure.lang.Util.equiv(Object, Object) 204984
> clojure.lang.PersistentHashMap$BitmapIndexedNode.find(int, int, Object,
> Object) 145968
> clojure.lang.PersistentHashMap$NodeSeq.create(Object[]) 81031
> clojure.lang.APersistentVector.doEquiv(IPersistentVector, Object) 41656
> clojure.lang.Util.dohasheq(IHashEq) 31453
> clojure.lang.RT.next(Object) 28671
> clojure.lang.PersistentHashMap$NodeSeq.next() 24484
> clojure.lang.PersistentVector$1.<init>(PersistentVector, int, int) 24015
>
> I concur with Paul that it looks like the program is spending nearly all
> its time mired in hashing.  Even the calls on here that aren't obviously
> related to hashing turn out on further inspection to be related to hashing.
>
> For example, doing a backtrace on Util.equiv, one sees that the =
> operation in the allowed? function is taking negligible time and the equiv
> method time is almost entirely spent in hash set lookup resolution.
> Similarly, the RT.next method is negligible time spent stepping through his
> for sequence, and appears to be almost entirely about stepping through hash
> collision nodes.
>
> My best guess at this point is that there is some sort of bug with the set
> implementation causing too many items to hash to the same bucket.  I have
> seen this behavior before from time to time when profiling some of my own
> code (a suspiciously inordinate amount of time spent resolving hash
> collisions) but have never been able to pin it down with such a pure
> example.
>
> I'll admit that my skills in reading and making sense out of profiler
> output is pretty weak, but it looks to me that that's what is going on here.
>
> --
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to clojure+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to