> In reading
>     http://apachetoday.com/news_story.php3?ltsn=2000-09-27-001-01-OP-CY-LF
> 
> I came across the following guideline for writing Apache documentation:
>     HTML tags should be lowercase wherever possible. In other
>     words, '<a href="foo.html">Link</a>' is preferred over
>     '<A HREF="foo.html">Link</A>'. This is because lowercase
>     letters result in more efficient space savings when documents
>     are compressed.
> 
> I'm trying to figure out how this could be true.
>       /r$
> 

While I can't imagine that there's anything special about lower case
per se, I can certainly imagine a compression scheme giving a more favorable
encoding to characters that are of the dominant case of the overall document.
Certainly a monocase document has less information in it than a mixed case
one.

This seems to be true in practice as well as in theory (at least for
gnu zip):

crypto$ ls -ld xyzzy.txt
-rw-rw-r--   1 mab      mab         46552 Sep 29 16:39 xyzzy.txt
crypto$ tr A-Z a-z < xyzzy.txt > xyzzy.lower.txt
crypto$ gzip xyzzy.txt
crypto$ gzip xyzzy.lower.txt 
crypto$ ls -ld xyzzy*
-rw-rw-r--   1 mab      mab         13451 Sep 29 16:40 xyzzy.lower.txt.gz
-rw-rw-r--   1 mab      mab         14171 Sep 29 16:39 xyzzy.txt.gz
crypto$ 

-matt



Reply via email to