On Sat, 2002-12-21 at 00:01, John Levon wrote: > On Fri, Dec 20, 2002 at 09:49:56AM +0100, Andre Poenitz wrote: > > > I'd rather make .lyx gzipped by default. Doesn't Staroffice do something > > similar? > > Lets look at the bloat first. You don't fix kernel bugs first by running > 3 redundant machines ...
I agree in principle, but would like to see tests to show which activity is actually more productive. Remember that shrinking the symbol names shouldn't have much of an effect on the gzip file size.. You'd have to actually start trimming away symbols to get a real benefit. As an example of what I mean, I prepared an example lyx file with two tables. There was hardly anything in this document, but its size is 11 kB. After piping through gzip it's just over 1 kB. (all files mentioned are attached in lyz form) Then I took the original and began doing find/replace to swap symbol names for abbreviations. I shaved off about 2.5 kB, with no loss of information. But gzipping it gave only a 52 byte saving. This is because gzip stores only one copy of each symbol and refers to it everywhere it is needed in the document. So shortening the symbols only shortens the single copy used by gzip. 52 bytes isn't worth the readability impact, since we want people to be able to do what they like to the file, including use it in 50 years, or grep it now. So I would say this: apart from obvious shortening of bloated symbols, leave them readable (and compatible!). As long as gzipping becomes the standard, that's a good thing since it's a tiny penalty for a large gain. But as for eliminating symbols altogether by more intelligent use of defaults, I'd say: go for it! That will shave off some data. But before making the statement definitive, I had another go. I removed what I thought looked like redundant info, from the already smaller file. I only touched the tables. I shaved off a further 2.3 kB. But upon compressing it, I actually added 26 bytes. I have no definitive explanation for why, but gzip obviously couldn't take advantage of patterns that were present before. So in this very simple example, there was hardly any gain in the compressed file despite putting in work to shave off up to 5 kB from the original. It would be very handy if the person coding file I/O could come up with some real tests on publically available documents to show how big a saving is to be had by reducing the bloat, if it's going to be compressed anyway. And maybe a comparison between gzip and bzip2? But I think bzip2 is less platform independent. I would note though that from a command line, gzipping is different from piping through gzip. Running gzip on the file has the effect of storing the original name in the file, piping doesn't. In reality people should be able to rename x.lyz into y.lyz. If they extract y.lyz they shouldn't get x.lyx. So let's *not* look at the bloat first, let's look at gzipping first =) > regards > john Have fun, Darren
table-test-small.lyz
Description: GNU Zip compressed data
table-test.lyz
Description: GNU Zip compressed data
table-test-cropped.lyz
Description: GNU Zip compressed data