On Dec 1, 2013, at 15:36 , Graham Cox <graham....@bigpond.com> wrote:

> Scanning my entire hard drive (excluding hidden files), which took several 
> hours, sure I had plenty of collisions - but absolutely no false ones - they 
> all turned out to be genuine duplicates of existing files. This is using the 
> FNV-1a 64-bit hash + length approach.
> 
> I’m thinking this is good enough, really. The odds of a particular user 
> having two different image files that collide, and happening to add those 
> exact images at once to our app must be astronomically low. Talk me out of it 
> :)

IIRC, you were worried about the cost of a full compare.  According to these 
data, the amortized cost of a full compare is effectively zero if you do a full 
compare when you get a collision.  So do the full compare when you get a 
collision in order not to lose data.  Then you can twiddle the hash to get you 
a good compromise of speed vs. collisions.  Mike Abdullah’s suggestion of file 
size as a first check seems ideal to me (I’ve been using that technique with 
string lookups to very good effect, files would work much better).  I wouldn’t 
use a straight hash table but a slightly more sophisticated data structure 
using multiple comparison levels.

On Dec 1, 2013, at 18:52 , Kyle Sluder <k...@ksluder.com> wrote:

> But as a matter of principle, it’s negligent to knowingly design a system 
> that will silently drop user data in normal operation. There are plenty of 
> times you can make a reasonable argument for “that’s good enough,” but as far 
> as I’m concerned, preserving user data is never one of them.

Seconded, thirded, …  Especially for a performance optimization when the 
effective performance cost of doing the final check is zero.

Marcel

_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Reply via email to