> On Nov 29, 2013, at 12:56 PM, Graham Cox <[email protected]> wrote:
> 
> The situation is, my app allows users to include images, which I’d like to 
> preserve in their original formats as far as possible. i.e. if they drag in a 
> JPEG, then I track the JPEG data and ultimately that gets embedded in my 
> file. If they drag the same image in more than once, I want to ensure that 
> the same data is used for as many copies as they include. So I compare the 
> image data with what I know about already, and if there’s a match, according 
> to the hash, then I substitute the original data I know about already.

In this scheme, if there is a hash collision, you lose user data. That should 
be a non-starter. You *must* do a full bytewise comparison in case of 
collision. 


> So the hash only needs to be computed whenever an object is added, so at that 
> time the time to compute the hash is negligible. But it’s also computed for 
> each image data read when opening the file, because even though each instance 
> is unique, it still needs to be hashed and stored in case the user 
> subsequently adds an image that matches one of these. It’s this that’s taking 
> noticeable time, for files that have lots of different images in them.

You also have another, damn-quick "hash key" that takes zero disk access to 
compute: -[NSData length].

Use the size as a primary key, then a reasonably low-collision/low-complexity 
hash algorithm as a secondary key. That seems like a reasonable compromise 
between runtime efficiency and implementation simplicity.

You could make the keys of your dictionary a custom object:

@interface ImageDataHashKey : NSObject <NSCopying>
+ (instancetype)keyForData:(NSData *)data;
@property (readonly) NSUInteger size;
@end

@implementation ImageDataHashKey
{
  NSUInteger _hash;
}

+ (instancetype)keyForData:(NSData *)data;
{
  ImageDataHashKey *key = [[self alloc] init];
  key->_hash = hashfunc(data);
}

- (NSUInteger)hash;
{
  return (NSUInteger)_hash;
}

- (BOOL)isEqual:(id)other;
{
  return [other isKindOfClass:[ImageDataHashKey class]] && self->_size == 
other->_size && self->_hash == other->_hash;
}
@end

//--

A further optimization might be to build your own cuckoo hash to avoid 
collisions entirely.

--Kyle Sluder
_______________________________________________

Cocoa-dev mailing list ([email protected])

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to [email protected]

Reply via email to