On 6 Feb 2006 06:35:14 -0800, "Fuzzyman" <[EMAIL PROTECTED]> wrote:
>Hello all, > >I'm trying to detect line endings used in text files. I *might* be >decoding the files into unicode first (which may be encoded using >multi-byte encodings) - which is why I'm not letting Python handle the >line endings. > >Is the following safe and sane : > >text = open('test.txt', 'rb').read() >if encoding: > text = text.decode(encoding) >ending = '\n' # default >if '\r\n' in text: > text = text.replace('\r\n', '\n') > ending = '\r\n' >elif '\n' in text: > ending = '\n' >elif '\r' in text: > text = text.replace('\r', '\n') > ending = '\r' > > >My worry is that if '\n' *doesn't* signify a line break on the Mac, >then it may exist in the body of the text - and trigger ``ending = >'\n'`` prematurely ? > Are you guaranteed that text bodies don't contain escape or quoting mechanisms for binary data where it would be a mistake to convert or delete an '\r' ? (E.g., I think XML CDATA might be an example). Regards, Bengt Richter -- http://mail.python.org/mailman/listinfo/python-list