Splitting text into lines

George Trojan - NOAA Federal Tue, 13 Dec 2016 08:49:24 -0800

I have files containing ASCII text with line s separated by '\r\r\n'.
Example:


$ od -c FTAK31_PANC_131140.1481629265635
0000000   F   T   A   K   3   1       P   A   N   C       1   3   1   1
0000020   4   0  \r  \r  \n   T   A   F   A   B   E  \r  \r  \n   T   A
0000040   F  \r  \r  \n   P   A   B   E       1   3   1   1   4   0   Z
0000060       1   3   1   2   /   1   4   1   2       0   7   0   1   0
0000100   K   T       P   6   S   M       S   C   T   0   3   5       O
0000120   V   C   0   6   0  \r  \r  \n                       F   M   1
0000140   3   2   1   0   0       1   0   0   1   2   G   2   0   K   T
0000160       P   6   S   M       B   K   N   1   0   0       W   S   0
0000200   1   5   /   1   8   0   3   5   K   T  \r  \r  \n
0000220           F   M   1   4   1   0   0   0       0   9   0   1   5
0000240   G   2   5   K   T       P   6   S   M       B   K   N   0   5
0000260   0       W   S   0   1   5   /   1   8   0   4   0   K   T   =
0000300  \r  \r  \n
0000303

What is the proper way of getting a list of lines?
Both
>>> open('FTAK31_PANC_131140.1481629265635').readlines()
['FTAK31 PANC 131140\n', '\n', 'TAFABE\n', '\n', 'TAF\n', '\n', 'PABE
131140Z 1312/1412 07010KT P6SM SCT035 OVC060\n', '\n', '     FM132100
10012G20KT P6SM BKN100 WS015/18035KT\n', '\n', '     FM141000 09015G25KT
P6SM BKN050 WS015/18040KT=\n', '\n']

and

>>> open('FTAK31_PANC_131140.1481629265635').read().splitlines()
['FTAK31 PANC 131140', '', 'TAFABE', '', 'TAF', '', 'PABE 131140Z 1312/1412
07010KT P6SM SCT035 OVC060', '', '     FM132100 10012G20KT P6SM BKN100
WS015/18035KT', '', '     FM141000 09015G25KT P6SM BKN050 WS015/18040KT=',
'']

introduce empty (or single character '\n') strings. I can do this:

>>> [x.rstrip() for x in open('FTAK31_PANC_131140.1481629265635',
'rb').read().decode().split('\n')]
['FTAK31 PANC 131140', 'TAFABE', 'TAF', 'PABE 131140Z 1312/1412 07010KT
P6SM SCT035 OVC060', '     FM132100 10012G20KT P6SM BKN100 WS015/18035KT',
'     FM141000 09015G25KT P6SM BKN050 WS015/18040KT=', '']

but it looks cumbersome. I Python2.x I stripped '\r' before passing the
string to split():

>>> open('FTAK31_PANC_131140.1481629265635').read().replace('\r', '')
'FTAK31 PANC 131140\nTAFABE\nTAF\nPABE 131140Z 1312/1412 07010KT P6SM
SCT035 OVC060\n     FM132100 10012G20KT P6SM BKN100 WS015/18035KT\n
FM141000 09015G25KT P6SM BKN050 WS015/18040KT=\n'

but Python 3.x replaces '\r\r\n' by '\n\n' on read().

Ideally I'd like to have code that handles both '\r\r\n' and '\n' as the
split character.

George
-- 
https://mail.python.org/mailman/listinfo/python-list

Splitting text into lines

Reply via email to