Stefano Mazzucco added the comment:

Martin, thanks for elaborating my thoughts!

I have dug I bit deeper in Python2's urllib code with pdb, and I think I have 
narrowed the issue down to what open_http does.

In my example code, replacing opener.open(url) with opener.open_http(url) gives 
the same problem.

I realize I did not provide you with the output of the script, so here it is:

* Python 2.7.10

python urllib_error.py
('Trying to open', 'https://www.python.org')
Traceback (most recent call last):
  File "urllib_error.py", line 30, in <module>
    opener.open_http((host, selector))
  File "/home/mazzucco/.pyenv/versions/2.7.10/lib/python2.7/urllib.py", line 
364, in open_http
    return self.http_error(url, fp, errcode, errmsg, headers)
  File "/home/mazzucco/.pyenv/versions/2.7.10/lib/python2.7/urllib.py", line 
381, in http_error
    return self.http_error_default(url, fp, errcode, errmsg, headers)
  File "/home/mazzucco/.pyenv/versions/2.7.10/lib/python2.7/urllib.py", line 
386, in http_error_default
    raise IOError, ('http error', errcode, errmsg, headers)
IOError: ('http error', 501, 'Not Implemented', <httplib.HTTPMessage instance 
at 0x7f875a67b950>)

* Python 3.4.3

python urllib_error.py
Trying to open https://www.python.org
Traceback (most recent call last):
  File "urllib_error.py", line 30, in <module>
    opener.open_http((host, selector))
  File "/home/mazzucco/.pyenv/versions/3.4.3/lib/python3.4/urllib/request.py", 
line 1805, in open_http
    return self._open_generic_http(http.client.HTTPConnection, url, data)
  File "/home/mazzucco/.pyenv/versions/3.4.3/lib/python3.4/urllib/request.py", 
line 1801, in _open_generic_http
    response.status, response.reason, response.msg, data)
  File "/home/mazzucco/.pyenv/versions/3.4.3/lib/python3.4/urllib/request.py", 
line 1821, in http_error
    return self.http_error_default(url, fp, errcode, errmsg, headers)
  File "/home/mazzucco/.pyenv/versions/3.4.3/lib/python3.4/urllib/request.py", 
line 1826, in http_error_default
    raise HTTPError(url, errcode, errmsg, headers, None)
urllib.error.HTTPError: HTTP Error 501: Not Implemented

When I unwrap the contents of httplib.HTTPMessage, the error page returned by 
the squid proxy says:

-------------------------------------------------------
ERROR
The requested URL could not be retrieved

The following error was encountered while trying to retrieve the URL: 
https://www.python.org

    Unsupported Request Method and Protocol

Squid does not support all request methods for all access protocols. For 
example, you can not POST a Gopher request.
-------------------------------------------------------

Looking at Python2's implementation of URLopener's open_http, I can get an even 
more minimal failing example limited to httplib:


import httplib

host = 'proxy.corp.com:8181'  # this is not the actual proxy

selector = 'https://www.python.org'

print("Trying to open", selector)

h = httplib.HTTP(host)
h.putrequest('GET', selector)
h.putheader('User-Agent', 'Python-urllib/1.17')
h.endheaders(None)
errcode, errmsg, headers = h.getreply()

print(errcode, errmsg)
print(headers.items())


Running the script on Python 2.7.10 prints:

('Trying to open', 'https://www.python.org')
(501, 'Not Implemented')
[('content-length', '3069'), ('via', '1.0 proxy.corp.com (squid/3.1.6)'), 
('x-cache', 'MISS from proxy.corp.com'), ('content-language', 'en'), 
('x-squid-error', 'ERR_UNSUP_REQ 0'), ('x-cache-lookup', 'NONE from 
proxy.corp.com:8181'), ('vary', 'Accept-Language'), ('server', 'squid/3.1.6'), 
('proxy-connection', 'close'), ('date', 'Fri, 10 Jul 2015 09:27:14 GMT'), 
('content-type', 'text/html'), ('mime-version', '1.0')]


As I said, I found out about this when using buildout to download files over 
HTTPS.

Buildout uses urllib.urlretrieve on Python2 and urllib.request.urlretrieve on 
Python3. I guess that the latter has been fixed in issue 1424152, so that's why 
I can download with buildout on Python3.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue24599>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to