Re: [Pharo-users] Equals and HashCode Builder

Stephane Ducasse Fri, 06 Oct 2017 01:02:36 -0700

Here is what I extracted from older emails

>From David Allouche in Pharo-dev to see if
The ==hash== method is used in Pharo to separate objects in
bucket-based data structures.

!! Property

It must have one required property which is required data integrity:

!!!! A.
Objects that are equal according to ==\=== must have equal hashes.

It should have two desirable properties that are required to support
the algorithmic assumptions of bucket-based data structures:

!!!! B.
Objects that are not equal should have different hashes. Ideally, the
probability of hash collision should by 1/N where N is the size of
hash value space.

!!!! C. The hash values should spread as much as possible over their
value space. For example: ProtoObject>>#identityHash.

How these two desirable properties can be implemented depends on the
actual distribution of the values used to compute the hash.

An object can safely use XOR hashing if its instance variables are all
different objects that already provide a hash method with these two
properties.

@@todo check following sentence because the use of singleton is obscure

For example, an object whose behaviour is defined by several of
singleton delegates of different classes that do not override
Object>>#hash.

But if the instance variables are not guaranteed to have those
properties, it is necessary to more thoroughly "mix the bits". For
example the  method ==hash== of ==SequenceableCollection== illustrates
this point.

[[[
SequenceableCollection >> hash
| hash |
hash := self species hash.
1 to: self size do: [:i | hash := (hash + (self at: i) hash) hashMultiply].
^ hash
]]]

The mixing of the bits is done by the combination of addition and
==hashMultiply==. The value of ==species hash== does not need to be
processed by ==hashMultiply==, because it is probably computed by the
method =identityHash== of ==ProtoObject==, and that already provides
property C.

!!! When we should not use XOR

==LayoutFrame== is a particularly good example of a data structure
that should not use XOR hashing. Its instance variables provide none
of the required properties:

- ==SmallInteger>>hash== is just ==^self==.
- In addition, common ==LayoutFrame== instances tend to use pairs of
identical values in their instance variables. Like ==0@0 corner:
1@1==, or ==Margin fromNumber: 10==.

A good hash function for LayoutFrame needs to:

- Get hashes for each instance variable that provides properties A and B.
- Combine them in a way that is order-sensitive (to maintain property
B) and that does some extra mixing (to provide property C).

Now, we can assume that common values for the instance variables will
be instances of ==SmallInteger==, ==Float==, or ==Fraction==.

You can see in ==Fraction>>hash==, that a comment mentions the
assumption that the fraction is already reduced, that is what makes it
acceptable to use bitXor.

The hash of ==SmallInteger== does provide properties A and B, but not
property C. The hash of ==Float== is harder to understand, but tests
show that it provides distinct values for 0.5, 0.25 and 0.125. So it
hopefully provides B for the range of values of interest to
==LayoutFrame==.

Another class that has similar hashing constraints to ==LayoutFrame==
is ==Point==. Its ==hash== method is:

[[[
Point >> hash
"Hash is reimplemented because = is implemented."
^(x hash hashMultiply + y hash) hashMultiply
]]]

It does not include species in the computation, which is less than
ideal because it favours collisions with objects of different species
that have a similar content and a the same hash algorithm.

Finally, it is probably not necessary or useful to use the accessors
to get to the instance variables.

So a good implementation for LayoutFrame is

[[[
LayoutFrame >> hash
| hash |
hash := self species hash
hash := (hash + leftFraction hash) hashMultiply
hash := (hash + leftOffset hash) hashMultiply
hash := (hash + topFraction hash) hashMultiply
hash := (hash + topOffset hash) hashMultiply
hash := (hash + rightFraction hash) hashMultiply
hash := (hash + rightOffset hash) hashMultiply
hash := (hash + bottomFraction hash) hashMultiply
hash := (hash + bottomOffset hash) hashMultiply
^ hash
]]]

On Fri, Oct 6, 2017 at 9:59 AM, Stephane Ducasse
<stepharo.s...@gmail.com> wrote:
> Noury also proposed a trait to avoid to systematically have to implement hash.
> Vitor why don't you take it as a small project to propose a solution for 
> Pharo.
>
> On Thu, Oct 5, 2017 at 6:34 PM, Vitor Medina Cruz <vitormc...@gmail.com> 
> wrote:
>> Yes, canEqual implementation also make #= be commutative.
>>
>> On Tue, Oct 3, 2017 at 11:11 AM, Denis Kudriashov <dionisi...@gmail.com>
>> wrote:
>>>
>>>
>>> 2017-10-02 17:30 GMT+02:00 Denis Kudriashov <dionisi...@gmail.com>:
>>>>
>>>>
>>>> 2017-10-02 17:13 GMT+02:00 Vitor Medina Cruz <vitormc...@gmail.com>:
>>>>>
>>>>> I am sorry, not species, but #isKindOf istead of #= to compare classes.
>>>>
>>>>
>>>> It is bad idea. #= should be transitive.
>>>
>>>
>>> Oh, I used wrong word, shame on me :). I tried to say commutative.
>>>
>>>>
>>>> How you will generate it with isKindOf: logic? You need to know common
>>>> parent.
>>>>
>>>> Also I not remember cases where I was needed two instances of different
>>>> classes to be equal.
>>>> And I can imaging the problems which it will lead to.
>>>>
>>>>>
>>>>>
>>>>> On Mon, Oct 2, 2017 at 11:57 AM, Denis Kudriashov <dionisi...@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> 2017-10-02 16:37 GMT+02:00 Sean P. DeNigris <s...@clipperadams.com>:
>>>>>>>
>>>>>>>
>>>>>>> Two questions/comments about the generated code:
>>>>>>> 1. #=
>>>>>>>         ...
>>>>>>>         self class = anObject class "should compare #species instead?"
>>>>>>>                 ifFalse: [ ^ false ].
>>>>>>>         ...
>>>>>>> Typically, I've seen #species instead of #class in the guard
>>>>>>> statement.
>>>>>>> Should we change it to that?
>>>>>>
>>>>>>
>>>>>> I doubt that it is important for domain classes. Because I never saw
>>>>>> the user of #species which is not a kind of Collection. And for 
>>>>>> collections
>>>>>> this refactoring is not valid anyway.
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2. #hash
>>>>>>>         ^ var1 hash bitXor: (var2 hash bitXor: var3 hash)
>>>>>>> Is this implementation always safe? It's what I usually hand roll
>>>>>>> based on
>>>>>>> what I've seen, but Andres Valloud wrote a whole (large) book on
>>>>>>> hashing, so
>>>>>>> I've always wondered if I was missing something…
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -----
>>>>>>> Cheers,
>>>>>>> Sean
>>>>>>> --
>>>>>>> Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>

Re: [Pharo-users] Equals and HashCode Builder

Reply via email to