On 27.08.2012 03:40, Tim Chase wrote:
So it looks like some python-list@ archiving process is double
gzip'ing the archives. Can anybody else confirm this and get the
info the right people?
In January, "random joe" noticed the same problem[1].
I think, Anssi Saari[2] was right in saying that there is something
wrong in the browser or server setup, because I notice the same
behaviour with Firefox, Chromium, wget and curl.
$ ll *July*
-rw-rw-r-- 1 andreas andreas 747850 Aug 27 13:48 chromium_2012-July.txt.gz
-rw-rw-r-- 1 andreas andreas 748041 Aug 27 13:41 curl_2012-July.txt.gz
-rw-rw-r-- 1 andreas andreas 747850 Aug 27 13:48 firefox_2012-July.txt.gz
-rw-rw-r-- 1 andreas andreas 748041 Aug 2 03:27 wget_2012-July.txt.gz
The browsers get a double gzipped file (size 747850) whereas the
download utilities get a normal gzipped file (size 748041).
After looking at the HTTP request and response headers I've noticed that
the browsers accept compressed data ("Accept-Encoding: gzip, deflate")
whereas wget/curl by default don't. After adding that header to
wget/curl they get the same double gzipped file as the browsers do:
$ ll *July*
-rw-rw-r-- 1 andreas andreas 747850 Aug 27 13:48 chromium_2012-July.txt.gz
-rw-rw-r-- 1 andreas andreas 748041 Aug 27 13:41 curl_2012-July.txt.gz
-rw-rw-r-- 1 andreas andreas 747850 Aug 27 13:40
curl_encoding_2012-July.txt.gz
-rw-rw-r-- 1 andreas andreas 747850 Aug 27 13:48 firefox_2012-July.txt.gz
-rw-rw-r-- 1 andreas andreas 748041 Aug 2 03:27 wget_2012-July.txt.gz
-rw-rw-r-- 1 andreas andreas 747850 Aug 2 03:27
wget_encoding_2012-July.txt.gz
I think the following is happening:
If you send the "Accept-Encoding: gzip, deflate"-header, the server will
gzip the file a second time (which is arguably unnecessary) and responds
with "Content-Encoding: gzip" and "Content-Type: application/x-gzip"
(which is IMHO correct according to RFC2616/14.11 and 14.17[3]).
But because many servers apparently don't set correct headers, the
default behaviour of most browsers nowadays is to ignore the
content-encoding for gzip files (application/x-gzip - see bug report for
firefox[4] and chromium[5]) and don't uncompress the outer layer,
leading to a double gzipped file in this case.
Bye, Andreas
[1] http://mail.python.org/pipermail/python-list/2012-January/617983.html
[2] http://mail.python.org/pipermail/python-list/2012-January/618211.html
[3] http://www.ietf.org/rfc/rfc2616
[4] https://bugzilla.mozilla.org/show_bug.cgi?id=610679#c5
[5] http://code.google.com/p/chromium/issues/detail?id=47951#c9
--
http://mail.python.org/mailman/listinfo/python-list