Re: detecting newline character

Daniel Geržo Sat, 23 Apr 2011 13:28:50 -0700

On 23.4.2011 21:33, Thomas 'PointedEars' Lahn wrote:

Chris Rebert wrote:

On Sat, Apr 23, 2011 at 11:09 AM, Daniel Geržo<dan...@rulez.sk>  wrote:

I need to detect the newline characters used in the file I am reading.
For this purpose I am using the following code:

def _read_lines(self):
     with contextlib.closing(codecs.open(self.path, "rU")) as fobj:
     fobj.readlines()
     if isinstance(fobj.newlines, tuple):
         self.newline = fobj.newlines[0]
     else:
         self.newline = fobj.newlines

This works fine, if I call codecs.open() without encoding argument; I am
testing with an ASCII enghlish text file, and in such case the
fobj.newlines is correctly detected being as '\r\n'. However, when I call
codecs.open() with encoding='ascii' argument, the fobj.newlines is None
and I can't figure out why that is the case. Reading the PEP at
http://www.python.org/dev/peps/pep-0278/ I don't see any reason why would
I end up with newlines being None after I call readlines().

Anyone has an idea?


I would hypothesize that it's an interaction bug between universal
newlines and codecs.open().

[…]
I would speculate that the upshot of this is that codecs.open() ends
up calling built-in open() with a nonsense `mode` of "rUb" or similar,
resulting in strange behavior.

If this explanation is correct, then there are 2 bugs:
1. Built-in open() should treat "b" and "U" as mutually exclusive and
reject mode strings which involve both.
2. codecs.open() should either reject modes involving "U", or be fixed
so that they work as expected.


You might be correct that it is a bug (already fixed in versions newer than
2.5), since codecs.open() from my Python 2.6 reads as follows:


Well I am doing this on:
Python 2.7.1 (r271:86832, Mar  7 2011, 14:28:09)
[GCC 4.2.1 (Apple Inc. build 5664)] on darwin

So what do you guys advise me to do?

--
S pozdravom / Best regards
  Daniel Gerzo
--
http://mail.python.org/mailman/listinfo/python-list

Re: detecting newline character

Reply via email to