Neil Hodgson wrote: > Fuzzyman: > > > How should I handle line-endings for UTF16 ? Is it possible that other > > programs (on windows) will have line endings as u'\r\n' ? > > Yes, try Notepad and save as Unicode. For the text > > Fuzzy > End of lines > > >>> contents = open("C:\\fuzzy.txt", "rb").read() > >>> contents > '\xff\xfeF\x00u\x00z\x00z\x00y\x00\r\x00\n\x00E\x00n\x00d\x00 > \x00o\x00f\x00 \x00l\x00i\x00n\x00e\x00s\x00' > >>> > > The '\r\x00\n\x00' is a u'\r\n'. > > > When saving > > files for that platform should I make the line endings u'\r\n' ? (This > > sequence obviously encodes to four bytes in UTF16). I would only do > > this to ensure compatibility with other programs the user may use to > > create the text files. > > Notepad will read u'\r\n'. It doesn't like '\n' or u'\n'. Some > applications are OK with other line ends by '\r\n' and u'\r\n' are > safest on Windows. >
Thanks - so I need to decode to unicode and *then* split on line endings. Problem is, that means I can't use Python to handle line endings where I don't know the encoding in advance. In another thread I've posted a small function that *guesses* line endings in use. All the best, Fuzzyman http://www.voidspace.org.uk/python/index.shtml > Neil -- http://mail.python.org/mailman/listinfo/python-list