On 27.08.2012 03:40, Tim Chase wrote:
So it looks like some python-list@ archiving process is double
gzip'ing the archives.  Can anybody else confirm this and get the
info the right people?

In January, "random joe" noticed the same problem[1].
I think, Anssi Saari[2] was right in saying that there is something wrong in the browser or server setup, because I notice the same behaviour with Firefox, Chromium, wget and curl.

$ ll *July*
-rw-rw-r-- 1 andreas andreas 747850 Aug 27 13:48 chromium_2012-July.txt.gz
-rw-rw-r-- 1 andreas andreas 748041 Aug 27 13:41 curl_2012-July.txt.gz
-rw-rw-r-- 1 andreas andreas 747850 Aug 27 13:48 firefox_2012-July.txt.gz
-rw-rw-r-- 1 andreas andreas 748041 Aug  2 03:27 wget_2012-July.txt.gz

The browsers get a double gzipped file (size 747850) whereas the download utilities get a normal gzipped file (size 748041).

After looking at the HTTP request and response headers I've noticed that the browsers accept compressed data ("Accept-Encoding: gzip, deflate") whereas wget/curl by default don't. After adding that header to wget/curl they get the same double gzipped file as the browsers do:

$ ll *July*
-rw-rw-r-- 1 andreas andreas 747850 Aug 27 13:48 chromium_2012-July.txt.gz
-rw-rw-r-- 1 andreas andreas 748041 Aug 27 13:41 curl_2012-July.txt.gz
-rw-rw-r-- 1 andreas andreas 747850 Aug 27 13:40 curl_encoding_2012-July.txt.gz
-rw-rw-r-- 1 andreas andreas 747850 Aug 27 13:48 firefox_2012-July.txt.gz
-rw-rw-r-- 1 andreas andreas 748041 Aug  2 03:27 wget_2012-July.txt.gz
-rw-rw-r-- 1 andreas andreas 747850 Aug 2 03:27 wget_encoding_2012-July.txt.gz

I think the following is happening:
If you send the "Accept-Encoding: gzip, deflate"-header, the server will gzip the file a second time (which is arguably unnecessary) and responds with "Content-Encoding: gzip" and "Content-Type: application/x-gzip" (which is IMHO correct according to RFC2616/14.11 and 14.17[3]). But because many servers apparently don't set correct headers, the default behaviour of most browsers nowadays is to ignore the content-encoding for gzip files (application/x-gzip - see bug report for firefox[4] and chromium[5]) and don't uncompress the outer layer, leading to a double gzipped file in this case.

Bye, Andreas

[1] http://mail.python.org/pipermail/python-list/2012-January/617983.html

[2] http://mail.python.org/pipermail/python-list/2012-January/618211.html

[3] http://www.ietf.org/rfc/rfc2616

[4] https://bugzilla.mozilla.org/show_bug.cgi?id=610679#c5

[5] http://code.google.com/p/chromium/issues/detail?id=47951#c9
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to