On Sat, 2002-12-21 at 00:01, John Levon wrote:
> On Fri, Dec 20, 2002 at 09:49:56AM +0100, Andre Poenitz wrote:
> 
> > I'd rather make .lyx gzipped by default. Doesn't Staroffice do something
> > similar?
> 
> Lets look at the bloat first. You don't fix kernel bugs first by running
> 3 redundant machines ...

I agree in principle, but would like to see tests to show which activity
is actually more productive. Remember that shrinking the symbol names
shouldn't have much of an effect on the gzip file size.. You'd have to
actually start trimming away symbols to get a real benefit.

As an example of what I mean, I prepared an example lyx file with two
tables. There was hardly anything in this document, but its size is 11
kB. After piping through gzip it's just over 1 kB. (all files mentioned
are attached in lyz form)

Then I took the original and began doing find/replace to swap symbol
names for abbreviations. I shaved off about 2.5 kB, with no loss of
information. But gzipping it gave only a 52 byte saving.

This is because gzip stores only one copy of each symbol and refers to
it everywhere it is needed in the document. So shortening the symbols
only shortens the single copy used by gzip. 52 bytes isn't worth the
readability impact, since we want people to be able to do what they like
to the file, including use it in 50 years, or grep it now.

So I would say this: apart from obvious shortening of bloated symbols,
leave them readable (and compatible!). As long as gzipping becomes the
standard, that's a good thing since it's a tiny penalty for a large
gain.

But as for eliminating symbols altogether by more intelligent use of
defaults, I'd say: go for it! That will shave off some data.

But before making the statement definitive, I had another go. I removed
what I thought looked like redundant info, from the already smaller
file. I only touched the tables. I shaved off a further 2.3 kB. But upon
compressing it, I actually added 26 bytes. I have no definitive
explanation for why, but gzip obviously couldn't take advantage of
patterns that were present before.

So in this very simple example, there was hardly any gain in the
compressed file despite putting in work to shave off up to 5 kB from the
original. It would be very handy if the person coding file I/O could
come up with some real tests on publically available documents to show
how big a saving is to be had by reducing the bloat, if it's going to be
compressed anyway.

And maybe a comparison between gzip and bzip2? But I think bzip2 is less
platform independent. I would note though that from a command line,
gzipping is different from piping through gzip. Running gzip on the file
has the effect of storing the original name in the file, piping doesn't.
In reality people should be able to rename x.lyz into y.lyz. If they
extract y.lyz they shouldn't get x.lyx.


So let's *not* look at the bloat first, let's look at gzipping first =)


> regards
> john

Have fun,
Darren

Attachment: table-test-small.lyz
Description: GNU Zip compressed data

Attachment: table-test.lyz
Description: GNU Zip compressed data

Attachment: table-test-cropped.lyz
Description: GNU Zip compressed data

Reply via email to