[issue1205] urllib fail to read URL contents, urllib2 crash Python

2007-09-26 Thread Francesco Cosoleto

New submission from Francesco Cosoleto:

urllib fail to read URL contents, urllib2 crash Python

Python version:
-
Python 2.5.1 (r251:54863, May 18 2007, 16:56:43) 
[GCC 3.4.4 (cygming special, gdc 0.12, using dmd 0.125)]

Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit 
(Intel)] on
win32

Python 2.4.4 (#2, Aug 16 2007, 00:34:54) 
[GCC 4.1.3 20070812 (prerelease) (Debian 4.1.2-15)] on linux2

-

Working with GNU wget:
-
$ wget -S http://www.recherche.fr/encyclopedie/Thomas-Robert_Bugeaud
--08:42:21--  http://www.recherche.fr/encyclopedie/Thomas-Robert_Bugeaud
   => `Thomas-Robert_Bugeaud'
Risoluzione di www.recherche.fr in corso... 88.191.11.214
Connessione a www.recherche.fr|88.191.11.214:80... connesso.
HTTP richiesta inviata, aspetto la risposta... 
  HTTP/1.1 200 OK
  Date: Wed, 26 Sep 2007 06:42:53 GMT
  Server: Apache/2.2.3 (Debian) PHP/5.2.3-0.dotdeb.1 with Suhosin-Patch
  X-Powered-By: PHP/5.2.3-0.dotdeb.1
  Keep-Alive: timeout=15, max=100
  Connection: Keep-Alive
  Transfer-Encoding: chunked
  Content-Type: text/html; charset=UTF-8
Lunghezza: non specificato [text/html]

[ <=> ] 
267,080   --.--K/s 

08:42:42 (14.11 KB/s) - "Thomas-Robert_Bugeaud" salvato [267080]
-

Python:
-
>>> import urllib
>>> a = urllib.urlopen('http://www.recherche.fr/encyclopedie/Thomas-
Robert_Bugeaud')
>>> c = a.read(1024*1024*2)
>>> len(c)   
1035220

>>> c[63000:64000]
'he.fr en page d\'accueil\n  Partenaires : http://www.cartes.fr/"; target="_blank">Cartes\n  
postales  http://www.deux.fr/script/"; 
target="_blank">Rencontres\n  gratuites\n    http://www.new.fr/"; target="_blank">Noms\n  de domaine 
gratuits  http://www.netencyclo.com/"; 
target="_blank">Encyclopedia \n  http://www.futureobject.com/"; 
target="_blank">http://www.recherche.fr/images/logo_fo.gif"; 
border="0" height="25" width="96">\n\n  \n \n 
\n\n\n\r\n\x00\x00\x00\x00\x00\x00\x00
\x00\x00[...omission...]\x00\x00\x00\x00'
-

As above, but with urllib2 module instead of urllib:

-
  File "/usr/lib/python2.5/socket.py", line 291, in read
data = self._sock.recv(recv_size)
  File "/usr/lib/python2.5/httplib.py", line 509, in read
return self._read_chunked(amt)
  File "/usr/lib/python2.5/httplib.py", line 548, in _read_chunked
chunk_left = int(line, 16)
ValueError: invalid literal for int() with base 16: '\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00[...omission...]\x00\x00\x00\x00\x00\x00\x00
\
-

As above, but with Python 2.4:
-
>>> import urllib2
>>> a = urllib2.urlopen('http://www.recherche.fr/encyclopedie/Thomas-
Robert_Bugeaud')

>>> 
>>> c = a.read(1024*1024*2)
Traceback (most recent call last):
  File "", line 1, in ?
  File "/usr/lib/python2.4/socket.py", line 295, in read
data = self._sock.recv(recv_size)
  File "/usr/lib/python2.4/httplib.py", line 460, in read
return self._read_chunked(amt)
  File "/usr/lib/python2.4/httplib.py", line 499, in _read_chunked
chunk_left = int(line, 16)
ValueError: invalid literal for int(): 
-

Regards,
Francesco Cosoleto

--
components: None
messages: 56143
nosy: cosoleto
severity: normal
status: open
title: urllib fail to read URL contents, urllib2 crash Python
type: crash
versions: Python 2.5

__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1205>
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1205] urllib fail to read URL contents, urllib2 crash Python

2008-01-02 Thread Francesco Cosoleto

Francesco Cosoleto added the comment:

Sorry, but I don't understand reason to close this issue with 
resolution "wont fix". The problem was reproducible and his logic 
explained by more developers. If the problem has been resolved, then, 
please, change "resolution" field to "fixed", else a patch request is 
pending (see msg56162). No? :-( Of course - it was predictable - the 
bug isn't reproducible now also using previous Python version: 

$ wget -c http://www.recherche.fr/encyclopedie/Thomas-Robert_Bugeaud
[..omisss..]
02:08:34 (4.28 KB/s) - "Thomas-Robert_Bugeaud" salvato [65107] 



Python 2.5.1 (r251:54863, May 18 2007, 16:56:43) 
>>> url = "http://www.recherche.fr/encyclopedie/Thomas-Robert_Bugeaud";
>>> a = urllib.urlopen(url) ; c = a.read(1024 * 1024 * 2)
>>> len(c)
65169

__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1205>
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com