New submission from Francesco Cosoleto:
urllib fail to read URL contents, urllib2 crash Python
Python version:
-
Python 2.5.1 (r251:54863, May 18 2007, 16:56:43)
[GCC 3.4.4 (cygming special, gdc 0.12, using dmd 0.125)]
Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit
(Intel)] on
win32
Python 2.4.4 (#2, Aug 16 2007, 00:34:54)
[GCC 4.1.3 20070812 (prerelease) (Debian 4.1.2-15)] on linux2
-
Working with GNU wget:
-
$ wget -S http://www.recherche.fr/encyclopedie/Thomas-Robert_Bugeaud
--08:42:21-- http://www.recherche.fr/encyclopedie/Thomas-Robert_Bugeaud
=> `Thomas-Robert_Bugeaud'
Risoluzione di www.recherche.fr in corso... 88.191.11.214
Connessione a www.recherche.fr|88.191.11.214:80... connesso.
HTTP richiesta inviata, aspetto la risposta...
HTTP/1.1 200 OK
Date: Wed, 26 Sep 2007 06:42:53 GMT
Server: Apache/2.2.3 (Debian) PHP/5.2.3-0.dotdeb.1 with Suhosin-Patch
X-Powered-By: PHP/5.2.3-0.dotdeb.1
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html; charset=UTF-8
Lunghezza: non specificato [text/html]
[ <=> ]
267,080 --.--K/s
08:42:42 (14.11 KB/s) - "Thomas-Robert_Bugeaud" salvato [267080]
-
Python:
-
>>> import urllib
>>> a = urllib.urlopen('http://www.recherche.fr/encyclopedie/Thomas-
Robert_Bugeaud')
>>> c = a.read(1024*1024*2)
>>> len(c)
1035220
>>> c[63000:64000]
'he.fr en page d\'accueil\n Partenaires : http://www.cartes.fr/"; target="_blank">Cartes\n
postales http://www.deux.fr/script/";
target="_blank">Rencontres\n gratuites\n http://www.new.fr/"; target="_blank">Noms\n de domaine
gratuits http://www.netencyclo.com/";
target="_blank">Encyclopedia \n http://www.futureobject.com/";
target="_blank">http://www.recherche.fr/images/logo_fo.gif";
border="0" height="25" width="96">\n\n \n \n
\n\n\n\r\n\x00\x00\x00\x00\x00\x00\x00
\x00\x00[...omission...]\x00\x00\x00\x00'
-
As above, but with urllib2 module instead of urllib:
-
File "/usr/lib/python2.5/socket.py", line 291, in read
data = self._sock.recv(recv_size)
File "/usr/lib/python2.5/httplib.py", line 509, in read
return self._read_chunked(amt)
File "/usr/lib/python2.5/httplib.py", line 548, in _read_chunked
chunk_left = int(line, 16)
ValueError: invalid literal for int() with base 16: '\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00[...omission...]\x00\x00\x00\x00\x00\x00\x00
\
-
As above, but with Python 2.4:
-
>>> import urllib2
>>> a = urllib2.urlopen('http://www.recherche.fr/encyclopedie/Thomas-
Robert_Bugeaud')
>>>
>>> c = a.read(1024*1024*2)
Traceback (most recent call last):
File "", line 1, in ?
File "/usr/lib/python2.4/socket.py", line 295, in read
data = self._sock.recv(recv_size)
File "/usr/lib/python2.4/httplib.py", line 460, in read
return self._read_chunked(amt)
File "/usr/lib/python2.4/httplib.py", line 499, in _read_chunked
chunk_left = int(line, 16)
ValueError: invalid literal for int():
-
Regards,
Francesco Cosoleto
--
components: None
messages: 56143
nosy: cosoleto
severity: normal
status: open
title: urllib fail to read URL contents, urllib2 crash Python
type: crash
versions: Python 2.5
__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1205>
__
___
Python-bugs-list mailing list
Unsubscribe:
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com