Sybren Stuvel wrote: > Fuzzyman enlightened us with: > > My worry is that if '\n' *doesn't* signify a line break on the Mac, > > then it may exist in the body of the text - and trigger ``ending = > > '\n'`` prematurely ? > > I'd count the number of occurences of '\r\n', '\n' without a preceding > '\r' and '\r' without following '\n', and let the majority decide. >
This is what I came up with. As you can see from the docstring, it attempts to sensible(-ish) things in the event of a tie, or no line endings at all. Comments/corrections welcomed. I know the tests aren't very useful (because they make no *assertions* they won't tell you if it breaks), but you can see what's going on : import re import os rn = re.compile('\r\n') r = re.compile('\r(?!\n)') n = re.compile('(?<!\r)\n') # Sequence of (regex, literal, priority) for each line ending line_ending = [(n, '\n', 3), (rn, '\r\n', 2), (r, '\r', 1)] def find_ending(text, default=os.linesep): """ Given a piece of text, use a simple heuristic to determine the line ending in use. Returns the value assigned to default if no line endings are found. This defaults to ``os.linesep``, the native line ending for the machine. If there is a tie between two endings, the priority chain is ``'\n', '\r\n', '\r'``. """ results = [(len(exp.findall(text)), priority, literal) for exp, literal, priority in line_ending] results.sort() print results if not sum([m[0] for m in results]): return default else: return results[-1][-1] if __name__ == '__main__': tests = [ 'hello\ngoodbye\nmy fish\n', 'hello\r\ngoodbye\r\nmy fish\r\n', 'hello\rgoodbye\rmy fish\r', 'hello\rgoodbye\n', '', '\r\r\r \n\n', '\n\n \r\n\r\n', '\n\n\r \r\r\n', '\n\r \n\r \n\r', ] for entry in tests: print repr(entry) print repr(find_ending(entry)) print All the best, Fuzzyman http://www.voidspace.org.uk/python/index.shtml > Sybren > -- > The problem with the world is stupidity. Not saying there should be a > capital punishment for stupidity, but why don't we just take the > safety labels off of everything and let the problem solve itself? > Frank Zappa -- http://mail.python.org/mailman/listinfo/python-list