If you do not really care about performance because there are not that may geometric objects (say thousands), I guess that this would work (bit pattern as string).
{ 0.5 binaryLiteralString hash. -0.5 binaryLiteralString hash. } Phil On Fri, Dec 9, 2016 at 6:45 PM, Ben Coman <b...@openinworld.com> wrote: > > > On Fri, Dec 9, 2016 at 10:51 PM, Hilaire <hila...@drgeo.eu> wrote: > >> Hi, >> >> In Dr. Geo I rely a lot on hash value for the geometric object. It >> largely improves speed up to compare geometric items (to avoid >> duplicated items). For example to detect when the user is building an >> already created object. >> >> Also when describing geometric sketch with Smalltalk code, hundred or >> thousand of geometric items are created. The hash value speeds up >> equality check (otherwise true comparing is iterative in the geometric >> sketch tree and it slow down) >> >> But sometime hash collisions occur. I guess in that case, DrGeo should >> do the real equality check, or may be I am overusing the hash? I am >> interested on advices. As for now I am doing nothing special. >> >> Here is an annoying hash collision with Pharo 3/ Pharo 5 I think should >> not occur: >> >> (0.5@0.25) hash >> 67006464 >> (-0.5@0.25) hash >> 67006464 >> >> >> Squeak5.0 does not have this collision. >> >> It looks related to the Float hash implementation a bit different. >> >> What do you think? >> > > More the point, this seems bad... > -0.5 hash "==> 57344" > 0.5 hash "==> 57344" > > I read page 307 of "Hashing in Smalltalk Theory and Practice" ** > regarding Squeak's (Pharo's) Point>>hash method... > "at this time we will not test this hash function for points with our > floating point data sets because we already know that [Pharo's] hash > function for floats is of bad quality." > > ** Errm... actually first time I've opened the book since I bought it a > couple of years ago. > > And from page 273 reviewing Float's hash... > "because of how the bitAnd: is used, the hash function will discard 32 > of the original 64 bits. In other words what seems a hash function using > all the availabel bytes is anything but. [Also] because of the way the two > 16 bit chunks are added togather, by the time bitshift:-8 is sent we will > only have, at most, 17 bits worth of hash value. Effectively all the > bitshift: accomplishes is to get rid of the 8 lowest bits, which are zero > anyway because of how the bitAnd: message were set up. Can these objections > add up to particularly poor performance? The answer is definitely yes - > since we will have only 17 bit hash values this hash function will be > particularly unsuitable for any data set containing more than 2^17 = > 131,072 objects" > > Float>>hash "Pharo" > (self isFinite and: [self fractionPart = 0.0]) ifTrue: [^self truncated > hash]. > ^ (((self basicAt: 1) bitAnd: 16r00FFFF00) + > ((self basicAt: 2) bitAnd: 16r00FFFF00)) bitShift: -8 > > The suggested fix depended on being willing to let go of byte ordering > (which we might not want different results on different platforms) > Float>>hash > ^ByteArray > hashBytes: self > startingWith: self species hash > > but anyway that doesn't actually help this case... > -0.5 hash "==>120484352" > 0.5 hash "==>120484352" > > I see Squeak5's float hash has changed... > Float>>hash "Squeak5" > (self isFinite and: [self fractionPart = 0.0]) ifTrue: [^self truncated > hash]. > ^ ((self basicAt: 1) bitShift: -4) + > ((self basicAt: 2) bitShift: -4) > > which gives... > 0.5 hash "==>66977792" > -0.5 hash "==>201195520" > > which solves the immediate problem, but I'm not capable of analysis its > quality further. > > cheers -ben >