I've posted a patch that makes java.util.{List,Map,Map.Entry,Set}
hashes consistent with those of appropriate Clojure collection types
on the ticket.

Performance of repeated hasheq lookups on a single PHM actually seems
to be slightly better with this patch applied. Adding the hasheqs of a
PHM, a string and a long takes slightly longer than on 1.6.0. See the
ticket for details.

The patch also changes "default" return-the-hashCode behaviour of
hasheq to match Strings. That's mostly because I originally wanted to
leave out the String case altogether, but that seems to hurt String
hashing quite badly. The original branch is up on GitHub, however (see
ticket for link).


On 13 May 2014 04:13, 'Mikera' via Clojure Dev
<clojure-...@googlegroups.com> wrote:
> On Monday, 12 May 2014 00:33:25 UTC+1, Alex Miller wrote:
>>
>> On Sun, May 11, 2014 at 2:06 PM, Mikera <mike.r.an...@gmail.com> wrote:
>>>
>>> OK..... this thread is a bit worrying. If I understand correctly, it
>>> means that we've now got inconsistent hash and equals functions. I suspect
>>> this hasn't bitten many people yet simply because it is unusual to mix Java
>>> and Clojure collections as keys in the same structure, but it seems like a
>>> really nasty issue.
>>
>>
>> Reports of real problems in real codebases would change my own opinion of
>> the severity of this issue.
>
>
> Are you suggesting correctness issues in APIs aren't severe? This is broken
> according to the documentation. From the docstrings:
>
> clojure.core/= "Same as
>   Java x.equals(y) except it also works for nil, and compares
>   numbers and collections in a type-independent manner."
>
> clojure.core/hash "Returns the hash code of its argument. Note this is the
> hash code
>   consistent with =, and thus is different than .hashCode for Integer,
>   Short, Byte and Clojure collections."
>
> This promises consistency, and also promises behaviour that works like
> x.equals(y) and x.hashCode() - except for a a few defined exceptions. Which
> means I should expect them to work on arbitrary Java objects.
>
> As a result, all sorts of code that depends on hashes directly or indirectly
> now has subtle bugs. clojure.core/merge is now broken for example:
>
> (let [a {[1 2] :a}
>       b {(java.util.ArrayList. [1 2]) :a}
>       m (zipmap (range 20) (range 20))]
>   [(= a b)   (= (merge m a) (merge m b))])
> => [true false]
>
> I hope everyone agrees that this isn't the kind of behaviour we want in core
> functions?
>
>>
>>>
>>> Hash and equality functions generally have a very simple rule: if two
>>> things are equal they must have the same hashcode. But now we've got:
>>>
>>> (= (java.util.ArrayList. [1 2]) [1 2])
>>> => true
>>>
>>> (hash (java.util.ArrayList. [1 2]))
>>> => 994
>>>
>>> (hash [1 2])
>>> => 156247261
>>>
>>> I consider this a major defect. If we want to define our own equals and
>>> hashcode that's fine, but then these functions absolutely should be
>>> consistent with each other.
>>>
>>> If it is really true that non-Clojure collections aren't considered as
>>> values and aren't expected to participate in hash/= comparisons then I'd
>>> expect an exception, not an erroneous value. Silent potential corruption of
>>> data is *much* worse than a noisy failure. But I think even that is a bad
>>> direction to go: easy Java ecosystem interop is definitely one of the main
>>> reasons why I'm using Clojure, and I'd like to see this maintained.
>>
>>
>> Please be more specific about where an exception could be thrown (equals
>> vs hash vs collection usage).
>>
>>> I can think of only two sensible ways to fix this inconsistency:
>>> a) Roll back the hash changes. Correctness is much more important than a
>>> performance edge case.
>>
>>
>> We're not doing that.
>>
>>>
>>> (I suspect the hash changes have hurt average case performance anyway...)
>>
>>
>> We have made hash calculation slightly slower to generate better hashes
>> across the board. Timing info is at
>> http://dev.clojure.org/display/design/Better+hashing but in general, the
>> time increases per hash are on the order of nanoseconds for simple values
>> and sub-millisecond for more complicated strings and collections (where
>> hashes are cached and times are likely to be dwarfed by other things). The
>> better hashes avoid catastrophically bad (and relatively easy to find)
>> cases.
>>
>>
>> It is quite challenging to predict the performance impact of the hashing
>> changes in Strings, keywords, and symbols due to the effects of usage
>> patterns, interning, caching, etc. I am aware of several people that have
>> reported better performance overall in 1.6 and I don't think I've heard any
>> reports of significantly slower performance in 1.6. I'm calling FUD on this
>> unless you have data to report.
>
>
> It's absolutely not FUD. I'm generally a pretty big Clojure evangelist....
> and my aim here is simply to be very clear about issues so that we can
> improve Clojure. I hope all my comments are taken in that spirit.
>
> The page you have linked shows that the hashing changes have slowed down
> various workloads (including the full Clojure compile). The Clojure
> micro-benchmarks posted recently also show various slowdowns related to
> hashing.
>
> For good measure (and because I care about small vector performance for
> numerical work / core.matrix stuff) here's my own criterium benchmark:
>
> ;; Clojure 1.6.0
> (c/quick-bench
>      (dotimes [i 1000] (hash [i i])))
> WARNING: Final GC required 9.3578822831726 % of runtime
> WARNING: Final GC required 119.680511859946 % of runtime
> Evaluation count : 1950 in 6 samples of 325 calls.
>              Execution time mean : 339.879773 µs
>     Execution time std-deviation : 98.446114 µs
>    Execution time lower quantile : 240.545585 µs ( 2.5%)
>    Execution time upper quantile : 453.154577 µs (97.5%)
>                    Overhead used : 4.007792 ns
>
> ;; Clojure 1.5.1
> (c/quick-bench
>      (dotimes [i 1000] (hash [i i])))
> WARNING: Final GC required 50.6032996175532 % of runtime
> Evaluation count : 5058 in 6 samples of 843 calls.
>              Execution time mean : 185.715873 µs
>     Execution time std-deviation : 16.476580 µs
>    Execution time lower quantile : 165.946452 µs ( 2.5%)
>    Execution time upper quantile : 207.266354 µs (97.5%)
>                    Overhead used : 4.905631 ns
>
> That's about a 1.8x slowdown, which seems quite noticeable (I conclude the
> cost here is from hashing changes, since the performance of vector
> construction is unchanged - the same benchmarks omitting the call to hash
> take about 57 microseconds in both Clojure 1.5.1 and 1.6.0)
>
> So - the data I've seen so far supports the assertion that the hashing
> changes slow down performance in the average / typical cases.
>
> Please Note: I'm not saying that the hashing changes are bad overall (I
> certainly like the guarantee of better hashes and the edge cases are indeed
> nasty). But let's not deny that this is a real trade-off.
>
>
>>
>>
>>>
>>> b) Make (= (java.util.ArrayList. [1 2]) [1 2]) and similar comparisons
>>> evaluate to false (on the grounds that ArrayLists aren't Clojure
>>> collections, so should not be considered equal).
>>
>>
>> Equality comparison between Clojure and Java collections is something that
>> seems relatively common. For example, calling a Java API that returns a List
>> and comparing the returned value with a vector. I don't want to lose that
>> ability.
>
>
> Agreed. This would be a breaking change. I certainly rely on this a lot in
> test suites.
>
>>
>>
>> c) Fail on hasheq of Java collections
>>
>> Adding this check would decrease the performance of hasheq for many other
>> types. Given what I know right now, I'm not inclined to make that tradeoff.
>
>
> I don't think this is a good idea either. Worse than performance, this would
> be a pretty severe breaking change.
>
>>
>>
>> d) Fail when conj'ing Java collections into Clojure map keys or sets.
>>
>> This would scope the check more but still add a performance hit for this
>> case on all other types. Note that Clojure maps or sets of only Java
>> collections currently work so this would take away functionality that might
>> be useful even if in a gray area.
>
>
> Again this is a big breaking change in behaviour. I don't think this is a
> good idea.
>
> Ultimately, I think the only good way to fix all this is to make hash and =
> consistent for all types. If we want to avoid significant breaking changes
> and we also want to keep the new better hash functions, then I think this
> implies extending the new hash calculation to relevant Java collections.
>
>>
>>
>>
>>
>>
>>>
>>>
>>> On Saturday, 10 May 2014 15:51:24 UTC+1, John Hume wrote:
>>>>
>>>> Thanks for the ticket pointer. I didn't find it in my initial search
>>>> because it "Affects Version/s: None" :-/, and I knew this behavior was new
>>>> to 1.6.
>>>>
>>>> For those not up for the medium-length comment-thread read:
>>>> non-Clojure-aware Java Collections Framework interface instances are not
>>>> considered values,* and 'Rich says "all bets should be off for hasheq/equiv
>>>> of non-values."' That's a bit down in the details, but I think the upshot 
>>>> is
>>>> that they're not currently supported as set members or hash keys.
>>>> Unfortunately they used to be supported (unintentionally, I guess) due to
>>>> Clojure following Java's lead, so I have some interop-heavy code to review
>>>> and test very carefully. Though the above seems to frame non-support as
>>>> based on principle, it seems the driver is keeping the new hasheq stuff
>>>> highly performant. I would hope a performant implementation that supported
>>>> the Collections interfaces would be accepted for some future version.
>>>>
>>>> *If there is one, I'd like to read a clear definition of what "values"
>>>> means in Clojure. The headings on http://clojure.org/data_structures are
>>>> almost an enumeration of them, except the "Collections" section talks a
>>>> little about Java Collections classes, hashCode, and hasheq without making
>>>> it clear that non-Clojure collections are not values. Unfortunately the
>>>> current definition seems to admit only the Java primitives and
>>>> near-primitives (String and primitive wrappers) and clojure-aware types.
>>>>
>>>> What "not supported" looks like via maps:
>>>> user> *clojure-version*
>>>> {:major 1, :minor 6, :incremental 0, :qualifier nil}
>>>> user> (hash-map [1 2] "v" (java.util.ArrayList. [1 2]) "a")
>>>> {[1 2] "a", [1 2] "v"}
>>>> user> {[1 2] "v" (java.util.ArrayList. [1 2]) "a"}
>>>> IllegalArgumentException Duplicate key: [1 2]
>>>> clojure.lang.PersistentArrayMap.createWithCheck 
>>>> (PersistentArrayMap.java:70)
>>>> user> (-> {} (assoc [1 2] "v") (assoc (java.util.ArrayList. [1 2]) "a"))
>>>> {[1 2] "a"}
>>>>
>>>>
>>>> On Friday, May 9, 2014 6:34:26 PM UTC-5, Andy Fingerhut wrote:
>>>>>
>>>>> Mike is correct.  This change in behavior is due to the hashing changes
>>>>> in Clojure 1.6.  See the comments on ticket
>>>>> http://dev.clojure.org/jira/browse/CLJ-1372 for some discussion of whether
>>>>> this is considered a bug.  It appears that perhaps the hash consistency is
>>>>> not promised for mutable objects, only immutable ones.
>>>>>
>>>>> Below are a couple more REPL transcripts that illustrate the change in
>>>>> behavior:
>>>>>
>>>>> user=> *clojure-version*
>>>>> {:major 1, :minor 5, :incremental 1, :qualifier nil}
>>>>> user=> (= [1 2] (java.util.ArrayList. [1 2]))
>>>>> true
>>>>> user=> (map hash [ [1 2] (java.util.ArrayList. [1 2]) ])
>>>>> (994 994)
>>>>> user=> (hash-map [1 2] "vec" (java.util.ArrayList. [1 2]) "alist")
>>>>> {[1 2] "alist"}
>>>>>
>>>>>
>>>>>
>>>>> user=> *clojure-version*
>>>>> {:major 1, :minor 6, :incremental 0, :qualifier nil}
>>>>> user=> (= [1 2] (java.util.ArrayList. [1 2]))
>>>>> true
>>>>> user=> (map hash [ [1 2] (java.util.ArrayList. [1 2]) ])
>>>>> (156247261 994)
>>>>> user=> (hash-map [1 2] "vec" (java.util.ArrayList. [1 2]) "alist")
>>>>> {[1 2] "alist", [1 2] "vec"}
>>>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "Clojure Dev" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an
>>> email to clojure-dev...@googlegroups.com.
>>> To post to this group, send email to cloju...@googlegroups.com.
>>>
>>> Visit this group at http://groups.google.com/group/clojure-dev.
>>> For more options, visit https://groups.google.com/d/optout.
>>
>>
> --
> You received this message because you are subscribed to the Google Groups
> "Clojure Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to clojure-dev+unsubscr...@googlegroups.com.
> To post to this group, send email to clojure-...@googlegroups.com.
> Visit this group at http://groups.google.com/group/clojure-dev.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to