On Thu, 23 Jun 2005 14:23:34 +0200, Eric Brunel <[EMAIL PROTECTED]> wrote:
> Hi all, > > I just found a problem in the xreadlines method/module when used with > codecs.open: the codec specified in the open does not seem to be taken into > account by xreadlines which also returns byte-strings instead of unicode > strings. > > For example, if a file foo.txt contains some text encoded in latin1: > >>>> import codecs >>>> f = codecs.open('foo.txt', 'r', 'utf-8', 'replace') >>>> [l for l in f.xreadlines()] > ['\xe9\xe0\xe7\xf9\n'] > > But: > >>>> import codecs >>>> f = codecs.open('foo.txt', 'r', 'utf-8', 'replace') >>>> f.readlines() > [u'\ufffd\ufffd'] > > The characters in latin1 are correctly "dumped" with readlines, but are still > in latin1 encoding in byte-strings with xreadlines. Replying to myself. One more funny thing: >>> import codecs, xreadlines >>> f = codecs.open('foo.txt', 'r', 'utf-8', 'replace') >>> [l for l in xreadlines.xreadlines(f)] [u'\ufffd\ufffd'] So f.xreadlines does not work, but xreadlines.xreadlines(f) does. And this happens in Python 2.3, but also in Python 2.1, where the implementation for f.xreadlines() calls xreadlines.xreadlines(f) (?!?). Something's escaping me here... Reading the source didn't help. At least, it does provide a workaround... -- python -c "print ''.join([chr(154 - ord(c)) for c in 'U(17zX(%,5.zmz5(17;8(%,5.Z65\'*9--56l7+-'])" -- http://mail.python.org/mailman/listinfo/python-list