cinap_len...@felloff.net wrote: |found it. the server sends Content-Encoding header which causes hget |to add a decompression filter, so you get as output a tarball. | |<- Content-Type: application/x-gzip |<- Content-Encoding: gzip
| |this is clearly silly, as the file is already compressed, \ |and decompressing it |will not yield the indicated content-type: application/x-gzip, \ |but a tarball. | |maybe the w3c is wrong, or is ignored in practice or we need to handle gzip |specially. the problem is that some webservers compress the \ The problem is that IANA doesn't support a tar-gz MIME type, so that mime.types(5) (tika [1] for Apache) will return "silly" values, as in application/gzip tgz gz emz application/x-bzip2 bz2 tbz2 boz # EXTENSION .tbz application/x-xz xz tbz application/x-tar tar [1] http://svn.apache.org/viewvc/tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml |data, like you request |a html file and it gives you gzip back, thats why hget uncompresses. mime.types(5) (re-)evaluating expanded content seems what IANA has in mind with its decision (it would be all too simple if it would just work (tm)). --steffen