If you do not really care about performance because there are not that may
geometric objects (say thousands), I guess that this would work (bit
pattern as string).

{
0.5 binaryLiteralString hash.
-0.5 binaryLiteralString hash.
}

Phil


On Fri, Dec 9, 2016 at 6:45 PM, Ben Coman <b...@openinworld.com> wrote:

>
>
> On Fri, Dec 9, 2016 at 10:51 PM, Hilaire <hila...@drgeo.eu> wrote:
>
>> Hi,
>>
>> In Dr. Geo I rely a lot on hash value for the geometric object. It
>> largely improves speed up to compare geometric items (to avoid
>> duplicated items). For example to detect when the user is building an
>> already created object.
>>
>> Also when describing geometric sketch with Smalltalk code, hundred or
>> thousand of geometric items are created. The hash value speeds up
>> equality check (otherwise true comparing is iterative in the geometric
>> sketch tree and it slow down)
>>
>> But sometime hash collisions occur. I guess in that case, DrGeo should
>> do the real equality check, or may be I am overusing the hash? I am
>> interested on advices. As for now I am doing nothing special.
>>
>> Here is an annoying hash collision with Pharo 3/ Pharo 5 I think should
>> not occur:
>>
>> (0.5@0.25) hash
>>          67006464
>> (-0.5@0.25) hash
>>         67006464
>>
>>
>> Squeak5.0 does not have this collision.
>>
>> It looks related to the Float hash implementation a bit different.
>>
>> What do you think?
>>
>
> More the point, this seems bad...
> -0.5 hash "==> 57344"
> 0.5 hash "==> 57344"
>
> I read page 307 of "Hashing in Smalltalk Theory and Practice" **
> regarding Squeak's (Pharo's) Point>>hash method...
>    "at this time we will not test this hash function for points with our
> floating point data sets because we already know that [Pharo's] hash
> function for floats is of bad quality."
>
> ** Errm... actually first time I've opened the book since I bought it a
> couple of years ago.
>
> And from page 273 reviewing Float's hash...
>     "because of how the bitAnd: is used, the hash function will discard 32
> of the original 64 bits.  In other words what seems a hash function using
> all the availabel bytes is anything but. [Also] because of the way the two
> 16 bit chunks are added togather, by the time bitshift:-8 is sent we will
> only have, at most, 17 bits worth of hash value.  Effectively all the
> bitshift: accomplishes is to get rid of the 8 lowest bits, which are zero
> anyway because of how the bitAnd: message were set up. Can these objections
> add up to particularly poor performance? The answer is definitely yes -
> since we will have only 17 bit hash values this hash function will be
> particularly unsuitable for any data set containing more than 2^17 =
> 131,072 objects"
>
>     Float>>hash "Pharo"
>   (self isFinite and: [self fractionPart = 0.0]) ifTrue: [^self truncated
> hash].
>   ^ (((self basicAt: 1) bitAnd: 16r00FFFF00) +
>     ((self basicAt: 2) bitAnd: 16r00FFFF00)) bitShift: -8
>
> The suggested fix depended on being willing to let go of byte ordering
> (which we might not want different results on different platforms)
>     Float>>hash
> ^ByteArray
>    hashBytes: self
>    startingWith: self species hash
>
> but anyway that doesn't actually help this case...
>     -0.5 hash "==>120484352"
>     0.5 hash "==>120484352"
>
> I see Squeak5's float hash has changed...
>      Float>>hash "Squeak5"
> (self isFinite and: [self fractionPart = 0.0]) ifTrue: [^self truncated
> hash].
> ^ ((self basicAt: 1) bitShift: -4) +
>   ((self basicAt: 2) bitShift: -4)
>
> which gives...
>     0.5 hash "==>66977792"
>    -0.5 hash "==>201195520"
>
> which solves the immediate problem, but I'm not capable of analysis its
> quality further.
>
> cheers -ben
>

Reply via email to