> In fact, a group of my friends coined the term "Microsoft Compression"
> to refer to the amazing phenomenon of saving the same Powerpoint
> document over and over again until the file size is as small as
> possible. We have observed that it incrementally grows with each save,
> until about 3 times the smallest size, then jumping back to the smallest
> size. So if your file won't fit on floppy, do that until it does =)

This is not due to compression at all. M$ office doesn't really compress its 
data. In fact, it inflates it! Here's how:

It is due to the fact that well... all non-XML m$ office documents are saved 
in ole compound files. OLE compound files are just another name for a 
supposedly-lightweight but quite s***ty filesystem which is maintained inside 
of a file (on linux a.k.a. mounting a filesystem image in a file through the 
loop). Office documents are small filesystems with all their drawbacks (and 
benefits where applicable ;-): filesystems usually have significant amount of 
"slack" -- unused free space.

What m$ office does is that it defragments and downsizes that "filesystem" 
when the amount of slack exceeds some threshold (either fixed or as a 
multiple of used space).

> What shits me about MC is that when you exceed your disc quota at uni,
> Powerpoint silently fux up your document with no hints. So you could
> lose your work (happened to me just before a presentation once).

That's becuase, like with everyday filesystems, OLE compound documents (which 
are treated by M$ office as filesystems) may be kept mounted. When they are 
kept mounted, they can also dynamically grow (when you add data to them). 
That's the reason why office apps:
1. lock the files when using them (an otherwise completely unnecessary thing)
2. may grow the files up to a certain limit -- the free space reclamation 
algorithm in the m$ OLE compound file implementation is pretty trivial and 
doesn't do a good job at all
3. will fux the files if they cannot grow them -- because they keep the 
"filesystem" mounted, and they just treat it as a real filesystem of 
unlimited size -- they don't try to verify whether it can be grown or not, 
and they add/change the data in the file even when you're not saving it.

This is somewhat simplified, but it's sad and true.

Cheers, Kuba Ober

Reply via email to