On 26/01/2009 10:34 AM, Tim Chase wrote:

I believe that using the formulaic "for line in file(FILENAME)" iteration guarantees that each "line" will have at most only one '\n' and it will be at the end (again, a malformed text-file with no terminal '\n' may cause it to be absent from the last line)

It seems that you are right -- not that I can find such a guarantee written anywhere. I had armchair-philosophised that writing "foo\n\r\nbar\r\n" to a file in binary mode and reading it on Windows in text mode would be strict and report the first line as "foo\n\n"; I was wrong.


So, we are left with the unfortunately awkward
    if line.endswith('\n'):
        line = line[:-1]

You're welcome to it, but I'll stick with my more DWIM solution of "get rid of anything that resembles an attempt at a CR/LF".

Thanks, but I don't want it. My point was that you didn't TTOPEWYM (tell the OP exactly what you meant).

My approach to DWIM with data is, given
   norm_space = lambda s: u' '.join(s.split())
to break up the line into fields first (just in case the field delimiter == '\t') then apply norm_space to each field. This gets rid of your '\r' at end (or start!) of line, and multiple whitespace characters are replaced by a single space. Whitespace includes NBSP (U+00A0) as an added bonus for being righteous and using Unicode :-)

Thank goodness I haven't found any of my data-sources using "\n\r" instead, which would require me to left-strip '\r' characters as well. Sigh. My kingdom for competency. :-/

Indeed. I actually got data in that format once from a *x programmer who was so kind as to do it that way just for me because he knew that I use Windows and he thought that's what Windows text files looked like. No kidding.

Cheers,
John
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to