One possibility of a change to PersistentHashSet and PersistentHashMap that
would help avoid these cases with pathologically bad performance:

If we had a 'universal comparator', i.e. a comparison function that
provided a total order on any pair of values that anyone would ever want to
put into a set or use as a map key, then instead of having linked lists for
values that collide, we could have trees like those in the implementations
of sorted-maps and sorted-sets today.  Such a comparison function might
look something like 'cc-cmp' a couple of pages down from this link:


https://github.com/jafingerhut/thalia/blob/master/doc/other-topics/comparators.md#comparators-that-work-between-different-types

However, I believe there will always be some types where such a comparator
will fail, and thus not truly be 'universal'.  Still, if it hit the sweet
spot of 99.99% of the use cases people cared about, it might be good to
have.

Andy


On Wed, Oct 23, 2013 at 9:06 AM, Andy Fingerhut <andy.finger...@gmail.com>wrote:

> Mark, I think you have hit the nail on the head.  I have instrumented a
> copy of Paul's Clojure program to print the hash code of all of the
> solutions in the set returned by solve, and there are *many* pairs of
> solutions that have identical hash values, and thus collide in a
> PersistentHashSet.  Below is an excerpt from that output:
>
>     97392280 = (hash #{[:N [2 0]] [:R [4 2]] [:K [0 0]] [:Q [1 3]] [:B [3
> 0]]})
>     97392280 = (hash #{[:N [3 0]] [:K [0 0]] [:R [1 2]] [:Q [4 3]] [:B [2
> 0]]})
>     97392280 = (hash #{[:N [3 0]] [:K [0 0]] [:R [2 1]] [:Q [1 4]] [:B [4
> 0]]})
>     97392280 = (hash #{[:N [4 0]] [:K [0 0]] [:R [3 3]] [:Q [1 2]] [:B [2
> 0]]})
>     97392280 = (hash #{[:N [3 0]] [:K [1 0]] [:R [0 2]] [:Q [2 3]] [:B [4
> 0]]})
>     97392280 = (hash #{[:N [3 0]] [:K [1 0]] [:R [2 3]] [:Q [0 2]] [:B [4
> 0]]})
>     97392280 = (hash #{[:N [4 0]] [:K [1 0]] [:R [0 2]] [:Q [2 3]] [:B [3
> 0]]})
>     97392280 = (hash #{[:N [4 0]] [:K [1 0]] [:R [2 3]] [:Q [0 2]] [:B [3
> 0]]})
>     97392280 = (hash #{[:K [3 0]] [:N [0 0]] [:Q [4 2]] [:R [2 3]] [:B [1
> 0]]})
>     97392280 = (hash #{[:K [3 0]] [:N [0 0]] [:R [4 2]] [:Q [2 3]] [:B [1
> 0]]})
>     97392280 = (hash #{[:N [1 0]] [:K [3 0]] [:Q [4 2]] [:R [2 3]] [:B [0
> 0]]})
>     97392280 = (hash #{[:N [1 0]] [:K [3 0]] [:R [4 2]] [:Q [2 3]] [:B [0
> 0]]})
>     97392280 = (hash #{[:K [4 0]] [:N [0 0]] [:Q [3 2]] [:R [1 3]] [:B [2
> 0]]})
>     97392280 = (hash #{[:K [4 0]] [:N [1 0]] [:R [3 2]] [:Q [0 3]] [:B [2
> 0]]})
>     97392280 = (hash #{[:K [4 0]] [:N [1 0]] [:R [2 1]] [:Q [3 4]] [:B [0
> 0]]})
>     97392280 = (hash #{[:N [2 0]] [:K [4 0]] [:R [0 2]] [:Q [3 3]] [:B [1
> 0]]})
>
> The hash value of a set is the sum of the hashes of its elements, because
> sets are not ordered.  The hash of a vector is dependent on the order of
> the elements.  In fact, for 2-element vectors it is 31*(hash of 1st elem) +
> (hash of 2nd elem).  The hash value of a small integer is just the integer
> value itself.
>
> Thus with many small integers in 2-element vectors like in this problem,
> all it takes is having a set of vectors where the first elements sum to the
> same value, and the second elements sum to the same value, and the hash of
> the whole set of vectors will be equal.
>
> Exactly what would be the best way to improve the hash code for situations
> like this, I am not sure yet.
>
> Andy
>
>
> On Wed, Oct 23, 2013 at 8:46 AM, Mark Engelberg 
> <mark.engelb...@gmail.com>wrote:
>
>> For those of you playing along at home, I also ran Paul's chess program
>> in YourKit with the sampling profiler.  Here are all the calls that appear
>> in the "HotSpots" view, along with time spent in those methods.
>>
>> java.io.PushbackInputStream.read() 422977
>> clojure.lang.PersistentHashMap$HashCollisionNode.findIndex(Object) 206687
>> clojure.lang.Util.equiv(Object, Object) 204984
>> clojure.lang.PersistentHashMap$BitmapIndexedNode.find(int, int, Object,
>> Object) 145968
>> clojure.lang.PersistentHashMap$NodeSeq.create(Object[]) 81031
>> clojure.lang.APersistentVector.doEquiv(IPersistentVector, Object) 41656
>> clojure.lang.Util.dohasheq(IHashEq) 31453
>> clojure.lang.RT.next(Object) 28671
>> clojure.lang.PersistentHashMap$NodeSeq.next() 24484
>> clojure.lang.PersistentVector$1.<init>(PersistentVector, int, int) 24015
>>
>> I concur with Paul that it looks like the program is spending nearly all
>> its time mired in hashing.  Even the calls on here that aren't obviously
>> related to hashing turn out on further inspection to be related to hashing.
>>
>> For example, doing a backtrace on Util.equiv, one sees that the =
>> operation in the allowed? function is taking negligible time and the equiv
>> method time is almost entirely spent in hash set lookup resolution.
>> Similarly, the RT.next method is negligible time spent stepping through his
>> for sequence, and appears to be almost entirely about stepping through hash
>> collision nodes.
>>
>> My best guess at this point is that there is some sort of bug with the
>> set implementation causing too many items to hash to the same bucket.  I
>> have seen this behavior before from time to time when profiling some of my
>> own code (a suspiciously inordinate amount of time spent resolving hash
>> collisions) but have never been able to pin it down with such a pure
>> example.
>>
>> I'll admit that my skills in reading and making sense out of profiler
>> output is pretty weak, but it looks to me that that's what is going on here.
>>
>> --
>> --
>> You received this message because you are subscribed to the Google
>> Groups "Clojure" group.
>> To post to this group, send email to clojure@googlegroups.com
>> Note that posts from new members are moderated - please be patient with
>> your first post.
>> To unsubscribe from this group, send email to
>> clojure+unsubscr...@googlegroups.com
>> For more options, visit this group at
>> http://groups.google.com/group/clojure?hl=en
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "Clojure" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to clojure+unsubscr...@googlegroups.com.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to