> In reading
> http://apachetoday.com/news_story.php3?ltsn=2000-09-27-001-01-OP-CY-LF
>
> I came across the following guideline for writing Apache documentation:
> HTML tags should be lowercase wherever possible. In other
> words, '<a href="foo.html">Link</a>' is preferred over
> '<A HREF="foo.html">Link</A>'. This is because lowercase
> letters result in more efficient space savings when documents
> are compressed.
>
> I'm trying to figure out how this could be true.
> /r$
>
While I can't imagine that there's anything special about lower case
per se, I can certainly imagine a compression scheme giving a more favorable
encoding to characters that are of the dominant case of the overall document.
Certainly a monocase document has less information in it than a mixed case
one.
This seems to be true in practice as well as in theory (at least for
gnu zip):
crypto$ ls -ld xyzzy.txt
-rw-rw-r-- 1 mab mab 46552 Sep 29 16:39 xyzzy.txt
crypto$ tr A-Z a-z < xyzzy.txt > xyzzy.lower.txt
crypto$ gzip xyzzy.txt
crypto$ gzip xyzzy.lower.txt
crypto$ ls -ld xyzzy*
-rw-rw-r-- 1 mab mab 13451 Sep 29 16:40 xyzzy.lower.txt.gz
-rw-rw-r-- 1 mab mab 14171 Sep 29 16:39 xyzzy.txt.gz
crypto$
-matt