New submission from Francesco Cosoleto: urllib fail to read URL contents, urllib2 crash Python
Python version: ------------------------- Python 2.5.1 (r251:54863, May 18 2007, 16:56:43) [GCC 3.4.4 (cygming special, gdc 0.12, using dmd 0.125)] Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit (Intel)] on win32 Python 2.4.4 (#2, Aug 16 2007, 00:34:54) [GCC 4.1.3 20070812 (prerelease) (Debian 4.1.2-15)] on linux2 ------------------------- Working with GNU wget: ------------------------- $ wget -S http://www.recherche.fr/encyclopedie/Thomas-Robert_Bugeaud --08:42:21-- http://www.recherche.fr/encyclopedie/Thomas-Robert_Bugeaud => `Thomas-Robert_Bugeaud' Risoluzione di www.recherche.fr in corso... 88.191.11.214 Connessione a www.recherche.fr|88.191.11.214:80... connesso. HTTP richiesta inviata, aspetto la risposta... HTTP/1.1 200 OK Date: Wed, 26 Sep 2007 06:42:53 GMT Server: Apache/2.2.3 (Debian) PHP/5.2.3-0.dotdeb.1 with Suhosin-Patch X-Powered-By: PHP/5.2.3-0.dotdeb.1 Keep-Alive: timeout=15, max=100 Connection: Keep-Alive Transfer-Encoding: chunked Content-Type: text/html; charset=UTF-8 Lunghezza: non specificato [text/html] [ <=> ] 267,080 --.--K/s 08:42:42 (14.11 KB/s) - "Thomas-Robert_Bugeaud" salvato [267080] ------------------------- Python: ------------------------- >>> import urllib >>> a = urllib.urlopen('http://www.recherche.fr/encyclopedie/Thomas- Robert_Bugeaud') >>> c = a.read(1024*1024*2) >>> len(c) 1035220 >>> c[63000:64000] 'he.fr en page d\'accueil</a><br>\n <span>Partenaires :</span> <a href="http://www.cartes.fr/" target="_blank">Cartes\n postales</a> <a href="http://www.deux.fr/script/" target="_blank">Rencontres\n gratuites\n </a> <a href="http://www.new.fr/" target="_blank">Noms\n de domaine gratuits</a> <a href="http://www.netencyclo.com/" target="_blank">Encyclopedia</a> </p>\n <p style="text- align:center;"><a href="http://www.futureobject.com/" target="_blank"><img src="http://www.recherche.fr/images/logo_fo.gif" border="0" height="25" width="96"></a></p>\n\n </p>\n </div>\n </div><!-- site -->\n</body>\n</html>\n\r\n\x00\x00\x00\x00\x00\x00\x00 \x00\x00[...omission...]\x00\x00\x00\x00' ------------------------- As above, but with urllib2 module instead of urllib: ------------------------- File "/usr/lib/python2.5/socket.py", line 291, in read data = self._sock.recv(recv_size) File "/usr/lib/python2.5/httplib.py", line 509, in read return self._read_chunked(amt) File "/usr/lib/python2.5/httplib.py", line 548, in _read_chunked chunk_left = int(line, 16) ValueError: invalid literal for int() with base 16: '\x00\x00\x00\x00 \x00\x00\x00\x00\x00\x00\x00[...omission...]\x00\x00\x00\x00\x00\x00\x00 \ ------------------------- As above, but with Python 2.4: ------------------------- >>> import urllib2 >>> a = urllib2.urlopen('http://www.recherche.fr/encyclopedie/Thomas- Robert_Bugeaud') >>> >>> c = a.read(1024*1024*2) Traceback (most recent call last): File "<stdin>", line 1, in ? File "/usr/lib/python2.4/socket.py", line 295, in read data = self._sock.recv(recv_size) File "/usr/lib/python2.4/httplib.py", line 460, in read return self._read_chunked(amt) File "/usr/lib/python2.4/httplib.py", line 499, in _read_chunked chunk_left = int(line, 16) ValueError: invalid literal for int(): ------------------------- Regards, Francesco Cosoleto ---------- components: None messages: 56143 nosy: cosoleto severity: normal status: open title: urllib fail to read URL contents, urllib2 crash Python type: crash versions: Python 2.5 __________________________________ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue1205> __________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com