One possibility of a change to PersistentHashSet and PersistentHashMap that would help avoid these cases with pathologically bad performance:
If we had a 'universal comparator', i.e. a comparison function that provided a total order on any pair of values that anyone would ever want to put into a set or use as a map key, then instead of having linked lists for values that collide, we could have trees like those in the implementations of sorted-maps and sorted-sets today. Such a comparison function might look something like 'cc-cmp' a couple of pages down from this link: https://github.com/jafingerhut/thalia/blob/master/doc/other-topics/comparators.md#comparators-that-work-between-different-types However, I believe there will always be some types where such a comparator will fail, and thus not truly be 'universal'. Still, if it hit the sweet spot of 99.99% of the use cases people cared about, it might be good to have. Andy On Wed, Oct 23, 2013 at 9:06 AM, Andy Fingerhut <andy.finger...@gmail.com>wrote: > Mark, I think you have hit the nail on the head. I have instrumented a > copy of Paul's Clojure program to print the hash code of all of the > solutions in the set returned by solve, and there are *many* pairs of > solutions that have identical hash values, and thus collide in a > PersistentHashSet. Below is an excerpt from that output: > > 97392280 = (hash #{[:N [2 0]] [:R [4 2]] [:K [0 0]] [:Q [1 3]] [:B [3 > 0]]}) > 97392280 = (hash #{[:N [3 0]] [:K [0 0]] [:R [1 2]] [:Q [4 3]] [:B [2 > 0]]}) > 97392280 = (hash #{[:N [3 0]] [:K [0 0]] [:R [2 1]] [:Q [1 4]] [:B [4 > 0]]}) > 97392280 = (hash #{[:N [4 0]] [:K [0 0]] [:R [3 3]] [:Q [1 2]] [:B [2 > 0]]}) > 97392280 = (hash #{[:N [3 0]] [:K [1 0]] [:R [0 2]] [:Q [2 3]] [:B [4 > 0]]}) > 97392280 = (hash #{[:N [3 0]] [:K [1 0]] [:R [2 3]] [:Q [0 2]] [:B [4 > 0]]}) > 97392280 = (hash #{[:N [4 0]] [:K [1 0]] [:R [0 2]] [:Q [2 3]] [:B [3 > 0]]}) > 97392280 = (hash #{[:N [4 0]] [:K [1 0]] [:R [2 3]] [:Q [0 2]] [:B [3 > 0]]}) > 97392280 = (hash #{[:K [3 0]] [:N [0 0]] [:Q [4 2]] [:R [2 3]] [:B [1 > 0]]}) > 97392280 = (hash #{[:K [3 0]] [:N [0 0]] [:R [4 2]] [:Q [2 3]] [:B [1 > 0]]}) > 97392280 = (hash #{[:N [1 0]] [:K [3 0]] [:Q [4 2]] [:R [2 3]] [:B [0 > 0]]}) > 97392280 = (hash #{[:N [1 0]] [:K [3 0]] [:R [4 2]] [:Q [2 3]] [:B [0 > 0]]}) > 97392280 = (hash #{[:K [4 0]] [:N [0 0]] [:Q [3 2]] [:R [1 3]] [:B [2 > 0]]}) > 97392280 = (hash #{[:K [4 0]] [:N [1 0]] [:R [3 2]] [:Q [0 3]] [:B [2 > 0]]}) > 97392280 = (hash #{[:K [4 0]] [:N [1 0]] [:R [2 1]] [:Q [3 4]] [:B [0 > 0]]}) > 97392280 = (hash #{[:N [2 0]] [:K [4 0]] [:R [0 2]] [:Q [3 3]] [:B [1 > 0]]}) > > The hash value of a set is the sum of the hashes of its elements, because > sets are not ordered. The hash of a vector is dependent on the order of > the elements. In fact, for 2-element vectors it is 31*(hash of 1st elem) + > (hash of 2nd elem). The hash value of a small integer is just the integer > value itself. > > Thus with many small integers in 2-element vectors like in this problem, > all it takes is having a set of vectors where the first elements sum to the > same value, and the second elements sum to the same value, and the hash of > the whole set of vectors will be equal. > > Exactly what would be the best way to improve the hash code for situations > like this, I am not sure yet. > > Andy > > > On Wed, Oct 23, 2013 at 8:46 AM, Mark Engelberg > <mark.engelb...@gmail.com>wrote: > >> For those of you playing along at home, I also ran Paul's chess program >> in YourKit with the sampling profiler. Here are all the calls that appear >> in the "HotSpots" view, along with time spent in those methods. >> >> java.io.PushbackInputStream.read() 422977 >> clojure.lang.PersistentHashMap$HashCollisionNode.findIndex(Object) 206687 >> clojure.lang.Util.equiv(Object, Object) 204984 >> clojure.lang.PersistentHashMap$BitmapIndexedNode.find(int, int, Object, >> Object) 145968 >> clojure.lang.PersistentHashMap$NodeSeq.create(Object[]) 81031 >> clojure.lang.APersistentVector.doEquiv(IPersistentVector, Object) 41656 >> clojure.lang.Util.dohasheq(IHashEq) 31453 >> clojure.lang.RT.next(Object) 28671 >> clojure.lang.PersistentHashMap$NodeSeq.next() 24484 >> clojure.lang.PersistentVector$1.<init>(PersistentVector, int, int) 24015 >> >> I concur with Paul that it looks like the program is spending nearly all >> its time mired in hashing. Even the calls on here that aren't obviously >> related to hashing turn out on further inspection to be related to hashing. >> >> For example, doing a backtrace on Util.equiv, one sees that the = >> operation in the allowed? function is taking negligible time and the equiv >> method time is almost entirely spent in hash set lookup resolution. >> Similarly, the RT.next method is negligible time spent stepping through his >> for sequence, and appears to be almost entirely about stepping through hash >> collision nodes. >> >> My best guess at this point is that there is some sort of bug with the >> set implementation causing too many items to hash to the same bucket. I >> have seen this behavior before from time to time when profiling some of my >> own code (a suspiciously inordinate amount of time spent resolving hash >> collisions) but have never been able to pin it down with such a pure >> example. >> >> I'll admit that my skills in reading and making sense out of profiler >> output is pretty weak, but it looks to me that that's what is going on here. >> >> -- >> -- >> You received this message because you are subscribed to the Google >> Groups "Clojure" group. >> To post to this group, send email to clojure@googlegroups.com >> Note that posts from new members are moderated - please be patient with >> your first post. >> To unsubscribe from this group, send email to >> clojure+unsubscr...@googlegroups.com >> For more options, visit this group at >> http://groups.google.com/group/clojure?hl=en >> --- >> You received this message because you are subscribed to the Google Groups >> "Clojure" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to clojure+unsubscr...@googlegroups.com. >> For more options, visit https://groups.google.com/groups/opt_out. >> > > -- -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.