On Feb 27, 2013, at 11:28 AM, Tony Parker <anthony.par...@apple.com> wrote:

> Out of curiosity, what do you expect to happen if your string is @“ab” or 
> something even longer, but repeated 1 million times? Your test implies that 
> the answer is 2,000,000 but in fact the answer is that it only grows one more 
> byte. The string is being de-duplicated but there is overhead associated with 
> each object in the archive. The amount seems egregious for an object that is 
> so small, (a string with one character), but real world archives are rarely 
> 1-character strings repeated 1 million times. Could the overhead be improved? 
> Probably, but there are many tradeoffs to make.

Also, this kind of repetition collapses down to nearly nothing when compressed 
with any generic data-compression algorithm like ZIP. 

Most popular data formats have a lot of strictly-not-necessary repetition in 
them (view source on any web page and count how many times the string “div” or 
“href” appears!), but if size is an issue it’s generally a better idea to pipe 
them through compression, as HTTP can, rather than go to a lot of trouble in 
the codec to eliminate redundancy.

—Jens

_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Reply via email to