On Dec 1, 2013, at 15:36 , Graham Cox <graham....@bigpond.com> wrote:
> Scanning my entire hard drive (excluding hidden files), which took several > hours, sure I had plenty of collisions - but absolutely no false ones - they > all turned out to be genuine duplicates of existing files. This is using the > FNV-1a 64-bit hash + length approach. > > I’m thinking this is good enough, really. The odds of a particular user > having two different image files that collide, and happening to add those > exact images at once to our app must be astronomically low. Talk me out of it > :) IIRC, you were worried about the cost of a full compare. According to these data, the amortized cost of a full compare is effectively zero if you do a full compare when you get a collision. So do the full compare when you get a collision in order not to lose data. Then you can twiddle the hash to get you a good compromise of speed vs. collisions. Mike Abdullah’s suggestion of file size as a first check seems ideal to me (I’ve been using that technique with string lookups to very good effect, files would work much better). I wouldn’t use a straight hash table but a slightly more sophisticated data structure using multiple comparison levels. On Dec 1, 2013, at 18:52 , Kyle Sluder <k...@ksluder.com> wrote: > But as a matter of principle, it’s negligent to knowingly design a system > that will silently drop user data in normal operation. There are plenty of > times you can make a reasonable argument for “that’s good enough,” but as far > as I’m concerned, preserving user data is never one of them. Seconded, thirded, … Especially for a performance optimization when the effective performance cost of doing the final check is zero. Marcel _______________________________________________ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com