> On Nov 29, 2013, at 12:56 PM, Graham Cox <[email protected]> wrote:
>
> The situation is, my app allows users to include images, which I’d like to
> preserve in their original formats as far as possible. i.e. if they drag in a
> JPEG, then I track the JPEG data and ultimately that gets embedded in my
> file. If they drag the same image in more than once, I want to ensure that
> the same data is used for as many copies as they include. So I compare the
> image data with what I know about already, and if there’s a match, according
> to the hash, then I substitute the original data I know about already.
In this scheme, if there is a hash collision, you lose user data. That should
be a non-starter. You *must* do a full bytewise comparison in case of
collision.
> So the hash only needs to be computed whenever an object is added, so at that
> time the time to compute the hash is negligible. But it’s also computed for
> each image data read when opening the file, because even though each instance
> is unique, it still needs to be hashed and stored in case the user
> subsequently adds an image that matches one of these. It’s this that’s taking
> noticeable time, for files that have lots of different images in them.
You also have another, damn-quick "hash key" that takes zero disk access to
compute: -[NSData length].
Use the size as a primary key, then a reasonably low-collision/low-complexity
hash algorithm as a secondary key. That seems like a reasonable compromise
between runtime efficiency and implementation simplicity.
You could make the keys of your dictionary a custom object:
@interface ImageDataHashKey : NSObject <NSCopying>
+ (instancetype)keyForData:(NSData *)data;
@property (readonly) NSUInteger size;
@end
@implementation ImageDataHashKey
{
NSUInteger _hash;
}
+ (instancetype)keyForData:(NSData *)data;
{
ImageDataHashKey *key = [[self alloc] init];
key->_hash = hashfunc(data);
}
- (NSUInteger)hash;
{
return (NSUInteger)_hash;
}
- (BOOL)isEqual:(id)other;
{
return [other isKindOfClass:[ImageDataHashKey class]] && self->_size ==
other->_size && self->_hash == other->_hash;
}
@end
//--
A further optimization might be to build your own cuckoo hash to avoid
collisions entirely.
--Kyle Sluder
_______________________________________________
Cocoa-dev mailing list ([email protected])
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com
This email sent to [email protected]