On Fri, Dec 9, 2016 at 10:51 PM, Hilaire <hila...@drgeo.eu> wrote: > Hi, > > In Dr. Geo I rely a lot on hash value for the geometric object. It > largely improves speed up to compare geometric items (to avoid > duplicated items). For example to detect when the user is building an > already created object. > > Also when describing geometric sketch with Smalltalk code, hundred or > thousand of geometric items are created. The hash value speeds up > equality check (otherwise true comparing is iterative in the geometric > sketch tree and it slow down) > > But sometime hash collisions occur. I guess in that case, DrGeo should > do the real equality check, or may be I am overusing the hash? I am > interested on advices. As for now I am doing nothing special. > > Here is an annoying hash collision with Pharo 3/ Pharo 5 I think should > not occur: > > (0.5@0.25) hash > 67006464 > (-0.5@0.25) hash > 67006464 > > > Squeak5.0 does not have this collision. > > It looks related to the Float hash implementation a bit different. > > What do you think? >
More the point, this seems bad... -0.5 hash "==> 57344" 0.5 hash "==> 57344" I read page 307 of "Hashing in Smalltalk Theory and Practice" ** regarding Squeak's (Pharo's) Point>>hash method... "at this time we will not test this hash function for points with our floating point data sets because we already know that [Pharo's] hash function for floats is of bad quality." ** Errm... actually first time I've opened the book since I bought it a couple of years ago. And from page 273 reviewing Float's hash... "because of how the bitAnd: is used, the hash function will discard 32 of the original 64 bits. In other words what seems a hash function using all the availabel bytes is anything but. [Also] because of the way the two 16 bit chunks are added togather, by the time bitshift:-8 is sent we will only have, at most, 17 bits worth of hash value. Effectively all the bitshift: accomplishes is to get rid of the 8 lowest bits, which are zero anyway because of how the bitAnd: message were set up. Can these objections add up to particularly poor performance? The answer is definitely yes - since we will have only 17 bit hash values this hash function will be particularly unsuitable for any data set containing more than 2^17 = 131,072 objects" Float>>hash "Pharo" (self isFinite and: [self fractionPart = 0.0]) ifTrue: [^self truncated hash]. ^ (((self basicAt: 1) bitAnd: 16r00FFFF00) + ((self basicAt: 2) bitAnd: 16r00FFFF00)) bitShift: -8 The suggested fix depended on being willing to let go of byte ordering (which we might not want different results on different platforms) Float>>hash ^ByteArray hashBytes: self startingWith: self species hash but anyway that doesn't actually help this case... -0.5 hash "==>120484352" 0.5 hash "==>120484352" I see Squeak5's float hash has changed... Float>>hash "Squeak5" (self isFinite and: [self fractionPart = 0.0]) ifTrue: [^self truncated hash]. ^ ((self basicAt: 1) bitShift: -4) + ((self basicAt: 2) bitShift: -4) which gives... 0.5 hash "==>66977792" -0.5 hash "==>201195520" which solves the immediate problem, but I'm not capable of analysis its quality further. cheers -ben