On Tue, Dec 13, 2016, at 12:25, George Trojan - NOAA Federal wrote: > > > > Are repeated newlines/carriage returns significant at all? What about > > just using re and just replacing any repeated instances of '\r' or '\n' > > with '\n'? I.e. something like > > >>> # the_string is your file all read in > > >>> import re > > >>> re.sub("[\r\n]+", "\n", the_string) > > and then continuing as before (i.e. splitting by newlines, etc.) > > Does that work? > > Cheers, > > Thomas > > > The '\r\r\n' string is a line separator, though not used consistently in > US > meteorological bulletins. I do not want to eliminate "real" empty lines.
I'd do re.sub("\r*\n", "\n", the_string). Any "real" empty lines are almost certainly going to have two \n characters, regardless of any \r characters. It looks like what *happens* is that the file, or some part of the file, had \r\n line endings originally and was "converted" to turn the \n into \r\n. > I was hoping there is a way to prevent read() from making hidden changes > to the file content. Pass newline='' into open. -- https://mail.python.org/mailman/listinfo/python-list