On 26/01/2009 10:34 AM, Tim Chase wrote:
I believe that using the formulaic "for line in file(FILENAME)"
iteration guarantees that each "line" will have at most only one '\n'
and it will be at the end (again, a malformed text-file with no terminal
'\n' may cause it to be absent from the last line)
It seems that you are right -- not that I can find such a guarantee
written anywhere. I had armchair-philosophised that writing
"foo\n\r\nbar\r\n" to a file in binary mode and reading it on Windows in
text mode would be strict and report the first line as "foo\n\n"; I was
wrong.
So, we are left with the unfortunately awkward
if line.endswith('\n'):
line = line[:-1]
You're welcome to it, but I'll stick with my more DWIM solution of "get
rid of anything that resembles an attempt at a CR/LF".
Thanks, but I don't want it. My point was that you didn't TTOPEWYM (tell
the OP exactly what you meant).
My approach to DWIM with data is, given
norm_space = lambda s: u' '.join(s.split())
to break up the line into fields first (just in case the field delimiter
== '\t') then apply norm_space to each field. This gets rid of your '\r'
at end (or start!) of line, and multiple whitespace characters are
replaced by a single space. Whitespace includes NBSP (U+00A0) as an
added bonus for being righteous and using Unicode :-)
Thank goodness I haven't found any of my data-sources using "\n\r"
instead, which would require me to left-strip '\r' characters as well.
Sigh. My kingdom for competency. :-/
Indeed. I actually got data in that format once from a *x programmer who
was so kind as to do it that way just for me because he knew that I use
Windows and he thought that's what Windows text files looked like. No
kidding.
Cheers,
John
--
http://mail.python.org/mailman/listinfo/python-list