Bengt Richter wrote: > On 6 Feb 2006 06:35:14 -0800, "Fuzzyman" <[EMAIL PROTECTED]> wrote: > > >Hello all, > > > >I'm trying to detect line endings used in text files. I *might* be > >decoding the files into unicode first (which may be encoded using > >multi-byte encodings) - which is why I'm not letting Python handle the > >line endings. > > > >Is the following safe and sane : > > > >text = open('test.txt', 'rb').read() > >if encoding: > > text = text.decode(encoding) > >ending = '\n' # default > >if '\r\n' in text: > > text = text.replace('\r\n', '\n') > > ending = '\r\n' > >elif '\n' in text: > > ending = '\n' > >elif '\r' in text: > > text = text.replace('\r', '\n') > > ending = '\r' > > > > > >My worry is that if '\n' *doesn't* signify a line break on the Mac, > >then it may exist in the body of the text - and trigger ``ending = > >'\n'`` prematurely ? > > > Are you guaranteed that text bodies don't contain escape or quoting > mechanisms for binary data where it would be a mistake to convert > or delete an '\r' ? (E.g., I think XML CDATA might be an example). >
My personal use case is for reading config files in arbitrary encodings (so it's not an issue). How would Python handle opening such files when not in binary mode ? That may be an issue even on Linux - if you open a windows file and use splitlines does Python convert '\r\n' to '\n' ? (or does it leave the extra '\r's in place, which is *different to the behaviour under windows). All the best, Fuzzyman http://www.voidspace.org.uk/python/index.shtml > Regards, > Bengt Richter -- http://mail.python.org/mailman/listinfo/python-list