Rogério Brito wrote: > I believe that you meant to file this as a Python bug and I think that the > severity is, quite frankly, lower than normal...
I don't think this is a python bug. It's reasonable for pythons's gzip
library to fail when presented with corrupted data. It does not know
it's being used to download an url. Perhaps it should have a mode where
it tries to extract as much data is it can, in case its caller wants to
try to be robust.
I think this is a bug in youtube-dl though, because of this code:
std_headers = {
...
'Accept-Encoding': 'gzip, deflate',
}
if resp.headers.get('Content-encoding', '') == 'gzip':
content = resp.read()
gz = gzip.GzipFile(fileobj=io.BytesIO(content), mode='rb')
try:
uncompressed = io.BytesIO(gz.read())
except IOError as original_ioerror:
# There may be junk add the end of the file
# See http://stackoverflow.com/q/4928560/35070 for details
for i in range(1, 1024):
try:
gz = gzip.GzipFile(fileobj=io.BytesIO(content[:-i]),
mode='rb')
uncompressed = io.BytesIO(gz.read())
except IOError:
continue
break
else:
raise original_ioerror
It's encouraging gzip to be used (rather than deflate or no compression),
and it already contains workarounds for similar problems. This code
smells.
There is probably a python library that implements this robustly.
I tried python-urllib3:
joey@darkstar:~>python
Python 2.7.14 (default, Sep 17 2017, 18:50:44)
[GCC 7.2.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib3
>>> http = urllib3.PoolManager()
>>> headers = {'Accept-Encoding': 'gzip'}
>>> r = http.request('GET', 'http://www.debian.org/', headers=headers)
>>> r.headers.get("Content-Encoding")
'gzip'
>>> len(r.data)
14871
So that seems to work. I think because it uses zlib to decompress the data,
not gzip.
--
see shy jo
signature.asc
Description: PGP signature

