[issue1205] urllib fail to read URL contents, urllib2 crash Python

2007-09-26 Thread Francesco Cosoleto

New submission from Francesco Cosoleto:

urllib fail to read URL contents, urllib2 crash Python

Python version:
-
Python 2.5.1 (r251:54863, May 18 2007, 16:56:43) 
[GCC 3.4.4 (cygming special, gdc 0.12, using dmd 0.125)]

Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit 
(Intel)] on
win32

Python 2.4.4 (#2, Aug 16 2007, 00:34:54) 
[GCC 4.1.3 20070812 (prerelease) (Debian 4.1.2-15)] on linux2

-

Working with GNU wget:
-
$ wget -S http://www.recherche.fr/encyclopedie/Thomas-Robert_Bugeaud
--08:42:21--  http://www.recherche.fr/encyclopedie/Thomas-Robert_Bugeaud
   => `Thomas-Robert_Bugeaud'
Risoluzione di www.recherche.fr in corso... 88.191.11.214
Connessione a www.recherche.fr|88.191.11.214:80... connesso.
HTTP richiesta inviata, aspetto la risposta... 
  HTTP/1.1 200 OK
  Date: Wed, 26 Sep 2007 06:42:53 GMT
  Server: Apache/2.2.3 (Debian) PHP/5.2.3-0.dotdeb.1 with Suhosin-Patch
  X-Powered-By: PHP/5.2.3-0.dotdeb.1
  Keep-Alive: timeout=15, max=100
  Connection: Keep-Alive
  Transfer-Encoding: chunked
  Content-Type: text/html; charset=UTF-8
Lunghezza: non specificato [text/html]

[ <=> ] 
267,080   --.--K/s 

08:42:42 (14.11 KB/s) - "Thomas-Robert_Bugeaud" salvato [267080]
-

Python:
-
>>> import urllib
>>> a = urllib.urlopen('http://www.recherche.fr/encyclopedie/Thomas-
Robert_Bugeaud')
>>> c = a.read(1024*1024*2)
>>> len(c)   
1035220

>>> c[63000:64000]
'he.fr en page d\'accueil\n  Partenaires : http://www.cartes.fr/"; target="_blank">Cartes\n  
postales  http://www.deux.fr/script/"; 
target="_blank">Rencontres\n  gratuites\n    http://www.new.fr/"; target="_blank">Noms\n  de domaine 
gratuits  http://www.netencyclo.com/"; 
target="_blank">Encyclopedia \n  http://www.futureobject.com/"; 
target="_blank">http://www.recherche.fr/images/logo_fo.gif"; 
border="0" height="25" width="96">\n\n  \n \n 
\n\n\n\r\n\x00\x00\x00\x00\x00\x00\x00
\x00\x00[...omission...]\x00\x00\x00\x00'
-

As above, but with urllib2 module instead of urllib:

-
  File "/usr/lib/python2.5/socket.py", line 291, in read
data = self._sock.recv(recv_size)
  File "/usr/lib/python2.5/httplib.py", line 509, in read
return self._read_chunked(amt)
  File "/usr/lib/python2.5/httplib.py", line 548, in _read_chunked
chunk_left = int(line, 16)
ValueError: invalid literal for int() with base 16: '\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00[...omission...]\x00\x00\x00\x00\x00\x00\x00
\
-

As above, but with Python 2.4:
-
>>> import urllib2
>>> a = urllib2.urlopen('http://www.recherche.fr/encyclopedie/Thomas-
Robert_Bugeaud')

>>> 
>>> c = a.read(1024*1024*2)
Traceback (most recent call last):
  File "", line 1, in ?
  File "/usr/lib/python2.4/socket.py", line 295, in read
data = self._sock.recv(recv_size)
  File "/usr/lib/python2.4/httplib.py", line 460, in read
return self._read_chunked(amt)
  File "/usr/lib/python2.4/httplib.py", line 499, in _read_chunked
chunk_left = int(line, 16)
ValueError: invalid literal for int(): 
-

Regards,
Francesco Cosoleto

--
components: None
messages: 56143
nosy: cosoleto
severity: normal
status: open
title: urllib fail to read URL contents, urllib2 crash Python
type: crash
versions: Python 2.5

__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1205] urllib fail to read URL contents, urllib2 crash Python

2007-09-26 Thread Gabriel Genellina

Gabriel Genellina added the comment:

This is a server bug. Internet Explorer 6 can't show the page either. 
The response is malformed; it uses chunked transfer, and RFC2616 
section 3.6.1 says "The chunk-size field is a string of hex digits 
indicating the size of the chunk. The chunked encoding is ended by any 
chunk whose size is zero[...]"

After the (first and only) chunk of around 63K, should come a 0-length 
chunk: a line with one or more digits "0" followed by CR+LF. But the 
server is not sending that last chunk, instead it sends lots of nul 
bytes, until eventually a CR,LF sequence arrives.
Neither IE nor Python can handle that (IE keeps requesting the page 
again and again). wget is apparently a lot more relaxed and decides 
that the first chunk is good enough. Perhaps urllib/urllib2 could 
handle the error and raise a more meaningful exception in this case, 
but just ignoring the error doesn't appear to be the right thing IMHO.

--
nosy: +gagenellina

__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1206] logging/__init__.py

2007-09-26 Thread Oleg Broytmann

New submission from Oleg Broytmann:

See the thread in the python-dev mailing list:
http://mail.python.org/pipermail/python-dev/2007-September/074732.html

--
components: Library (Lib)
files: __init__.py.patch
messages: 56145
nosy: phd
severity: minor
status: open
title: logging/__init__.py
versions: Python 2.5

__
Tracker <[EMAIL PROTECTED]>

__--- __init__.py.orig	2007-04-05 22:11:22.0 +0400
+++ __init__.py	2007-09-26 20:44:59.0 +0400
@@ -223,7 +223,7 @@
 # 'Value is %d' instead of 'Value is 0'.
 # For the use case of passing a dictionary, this should not be a
 # problem.
-if args and (len(args) == 1) and args[0] and (type(args[0]) == types.DictType):
+if args and len(args) == 1 and isinstance(args[0], dict) and args[0]:
 args = args[0]
 self.args = args
 self.levelname = getLevelName(level)
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1202] zlib.crc32() and adler32() return value

2007-09-26 Thread Guido van Rossum

Guido van Rossum added the comment:

Since it's basically a magic cookie, not a meaningful numeric value, I'd
propose sticking with backwards compatibility and fixing the 64-bit
version to return a signed version.

return x - ((x & 0x8000) <<1)

anyone?

--
nosy: +gvanrossum

__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1206] logging/__init__.py

2007-09-26 Thread Martin v. Löwis

Changes by Martin v. Löwis:


--
keywords: +patch

__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1205] urllib fail to read URL contents, urllib2 crash Python

2007-09-26 Thread Guido van Rossum

Guido van Rossum added the comment:

Maybe the French internet is incompatible with the rest of the world? :-)

--
nosy: +gvanrossum

__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1207] Load tests from path (patch included)

2007-09-26 Thread Bluebird

New submission from Bluebird:

Something very nice about unittest is that it can find automatically the
TestCase that you declare, and the test methods of every test case. This
makes the operation of adding or removing tests very simple.

For test modules however, there is nothing to automatically load all the
modules of a given directory. I think that would be very helpful.

Here is my proposal, to add to the set of TestLoader methods:


def loadTestsFromPath( path='', filePattern='test*.py' ):
'''Load all the TestCase in all the module of the given path.

path: directory containing test files
filePattern: glob pattern to find test modules inside path. Default
is test*.py

The path will be converted into an import statement so anything that
can not be imported will
not work.  The path must be relative to the current directory, and
can not include '.' and '..'
directories.

To simply load all the test files of the current directories, pass
an empty path (the default).

Return a test suite containing all the tests.
'''

if len(path) == 0:
pathPattern = filePattern
else:
pathPattern = path + '/' + filePattern
pathPattern = os.path.normpath( pathPattern )
fileList = glob.glob( pathPattern )

mainSuite = TestSuite()
for f in fileList:
importName = f[:-3]
importName = importName.replace( '\\', '.' )
importName = importName.replace( '/', '.' )

suite = defaultTestLoader.loadTestsFromName(importName)
mainSuite._tests.extend( suite._tests )

return mainSuite
===

I use it like this: on my project, I have the following directory
organisation:

vy
  + run_all_tests.py
  + tests
 - run_tests.py
 - test_xxx.py
 - test_yyy.py
  + libvy
 + tests
- run_tests.py
- test_xxx.py
- test_yyy.py
  + qvy
+ tests
- run_tests.py
- test_xxx.py
- test_yyy.py

 
I can do either:
- cd libvy/tests && python run_tests.py
- cd qvy/tests && python run_tests.py
- cd tests && python run_tests.py
- run_all_tests.py

Each time I add a new test module, it is automatically picked up by the
test runners thank to the loadFromPath() feature. It makes it easy to
maintain the global test suite that runs all the tests. That's the most
important one because that test suite is responsible for non regression.

run_tests.py:
=
if __name__ == '__main__':
mainSuite = TestSuite()
mainSuite._tests.extend( loadTestsFromPath('.')._tests )
ttr = TextTestRunner(verbosity=2)
ttr.run( mainSuite )

run_all_tests.py:
=
if __name__ == '__main__':
mainSuite = TestSuite()
mainSuite._tests.extend( loadTestsFromPath( 'libvy/tests' )._tests )
mainSuite._tests.extend( loadTestsFromPath( 'qvy/tests' )._tests )
mainSuite._tests.extend( loadTestsFromPath( 'tests' )._tests )
ttr = TextTestRunner(verbosity=2)
ttr.run( mainSuite )

--
components: Tests
messages: 56148
nosy: bluebird
severity: normal
status: open
title: Load tests from path (patch included)
type: rfe
versions: Python 2.5

__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1208] Match object should be guaranteed to always be true

2007-09-26 Thread Martin Horcicka

New submission from Martin Horcicka:

Many people expect the match object from the re module to always be
true. They use it this way:

if regexp.match(string): do_something()

Some people do not expect it and use it differently:

if regexp.match(string) is not None: do_something()

Even in the standard library both ways are used. The first way is
simpler and nicer and thus better, in my opinion.

Current implementation of the match object (implemented as
_sre.SRE_Match object in Modules/_sre.c) seems to guarantee the trueness
(someone should check it) but in fact, there is no guarantee described
in the documentation.

--
components: Documentation, Library (Lib)
messages: 56149
nosy: horcicka
severity: normal
status: open
title: Match object should be guaranteed to always be true
type: behavior

__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1209] IOError won't accept tuples longer than 3

2007-09-26 Thread Jim Jewett

New submission from 
Jim Jewett
:

EnvironmentError (including subclass IOError) has special treatment when 
constructed with a 2-tuple or 3-tuple.  A four-tuple turns off this special 
treatment (and was used by urllib for that reason).  As of 2.5, a four-tuple 
raises a TypeError instead of just turning off the special treatment.

--
components: Extension Modules
messages: 56150
nosy: jimjjewett
severity: normal
status: open
title: IOError won't accept tuples longer than 3
type: behavior
versions: Python 2.5

__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1205] urllib fail to read URL contents, urllib2 crash Python

2007-09-26 Thread jos


jos
 added the comment:

Firefox 2.0.0.7 and Safari 2.0.4 can who this page.

In my opinion, Python urllib should be more practical and
provide a way to read this kind of page.

"In general, an implementation must be conservative
in its sending behavior, and liberal in its receiving behavior."
[RFC 791 3.2]

--
nosy: +josm

__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1760556] logging.FileHandler may throw exception in flush()

2007-09-26 Thread Vinay Sajip

Vinay Sajip added the comment:

Fix checked in to trunk: r58268

--
resolution:  -> fixed
status: open -> closed

_
Tracker <[EMAIL PROTECTED]>

_
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1021] logging.basicConfig does not allow to set NOTSET level

2007-09-26 Thread Vinay Sajip

New submission from Vinay Sajip:

Fix checked in to trunk - r58269.

--
resolution:  -> fixed
status: open -> closed

__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1210] imaplib does not run under Python 3

2007-09-26 Thread Robert T McQuaid

New submission from Robert T McQuaid:

imaplib does not run under Python 3.

The following two-line python program, named testimap.py,
works when run from a Windows XP system shell prompt
using Python 2.5.1, but fails with Python 3.0.  It
appears that the logic does not follow the distinction
between characters and bytes in Python 3.


import imaplib
mail=imaplib.IMAP4("mail.rtmq.infosathse.com")


e:\python25\python   testimap.py
e:\python30\python   testimap.py 2>f:syserr


The last line produced the trace:


Traceback (most recent call last):
  File "testimap.py", line 10, in 
mail=imaplib.IMAP4("mail.rtmq.infosathse.com")
  File "e:\python30\lib\imaplib.py", line 184, in __init__
self.welcome = self._get_response()
  File "e:\python30\lib\imaplib.py", line 962, in _get_response
self._append_untagged(typ, dat)
  File "e:\python30\lib\imaplib.py", line 800, in _append_untagged
if typ in ur:
TypeError: unhashable type: 'bytes'

--
components: Library (Lib)
messages: 56154
nosy: rtmq
severity: normal
status: open
title: imaplib does not run under Python 3
type: crash
versions: Python 3.0

__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1752539] RotatingFileHandler.doRollover behave wrong vs. log4j's

2007-09-26 Thread Vinay Sajip

Vinay Sajip added the comment:

I have now had some more time to think about this issue. I don't believe
any changes are warranted, because "Errors should never pass silently.
Unless explicitly silenced." and since rename errors are usually due to
application- or environment-specific conditions, you need to handle
these in application code.

If logging continues to use the existing log file because renaming
fails, then it does not behave according to expectations - e.g. maximum
sizes on log files will not be honoured.

Likewise, logging does not attempt to use makedirs() to ensure that the
parent directory path is created first - a typo in the path would lead
to an unexpected location for the log file.

While Python logging borrowed a lot from log4j, it is far from a
straight port; the Zen of Python is different from the Zen of Java.

--
resolution:  -> invalid
status: open -> closed

_
Tracker <[EMAIL PROTECTED]>

_
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1210] imaplib does not run under Python 3

2007-09-26 Thread Martin v. Löwis

Martin v. Löwis added the comment:

Would you like to work on a patch?

--
nosy: +loewis

__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1711603] syslog syscall support for SysLogLogger

2007-09-26 Thread Vinay Sajip

Vinay Sajip added the comment:

It's only a bug when it doesn't work according to design. The present
design seems adequate in that it allows syslogging via UDP or domain
sockets. No one else has asked for the functionality of using system
calls. BTW I note that Metalog's home page says it's a modern
replacement for syslogd and klogd - so one might have reasonably
expected socket support.

You can avoid having to patch Python each time by the simple expedient
of creating a SysLogHandler subclass (a one-time operation) and using it
in place of the included SysLogHandler.

--
status: open -> closed

_
Tracker <[EMAIL PROTECTED]>

_
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1209] IOError won't accept tuples longer than 3

2007-09-26 Thread Georg Brandl

Georg Brandl added the comment:

Wasn't that already fixed in #1566800?

--
nosy: +georg.brandl
resolution:  -> out of date
status: open -> closed

__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1208] Match object should be guaranteed to always be true

2007-09-26 Thread Georg Brandl

Georg Brandl added the comment:

Fixed in the docs as r58270.

--
nosy: +georg.brandl
resolution:  -> fixed
status: open -> closed

__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1711603] syslog syscall support for SysLogLogger

2007-09-26 Thread Luke-Jr


Luke-Jr
 added the comment:

So label it a "design flaw" if not a bug. Syscalls are the primary and 
only guaranteed method of writing to the system log. Very few 
applications or users use sockets for syslog, and socket support 
should only be required when logging to a remote system.

Even if this is treated as a 'feature' (which it clearly is more 
than), it should still be merged into the current development branch.

_
Tracker <[EMAIL PROTECTED]>

_
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com