Flynn, Stephen (L & P - IT) wrote: > Python 3.2, as in the subject, although I also have 2.7 on this machine > too. > > > > I have some data which contains text separated with field delimiters > (|~) and a record terminator (||) > > 123456009999990|~52299999|~9999990|~0|~4|~1|~2006-09-08|~13:29:39|~some > text.|~xxxxxxx, xxxxx|~|| > 123456009999991|~52299999|~1999999|~0|~4|~1|~2009-06-05|~15:25:25|~some > more text|~xxxxx, xxxxxxa|~|| > 123456009999992|~51199999|~9999998|~8253265|~5|~11|~2011-07-19|~16:55:03 > |~Some Split text over > serveral > lines > |~Aldxxxxe, Mxxxx|~|| > 123456009999993|~59999999|~2999999|~8253265|~5|~11|~2011-07-11|~15:06:53 > |~Yet more text: > which has been split up with > carriage returns, line feeds or possibly both, depending upon your > operating system. > |~Imxxx, xxxxxxed|~|| > > > I'm trying to reformat this data so that each record terminated with a > "||" is on a single line, as in > > 123456009999990|~52299999|~9999990|~0|~4|~1|~2006-09-08|~13:29:39|~some > text.|~xxxxxxx, xxxxx|~|| > 123456009999991|~52299999|~1999999|~0|~4|~1|~2009-06-05|~15:25:25|~some > more text|~xxxxx, xxxxxxa|~|| > 123456009999992|~51199999|~9999998|~8253265|~5|~11|~2011-07-19|~16:55:03 > |~Some Split text over serveral lines|~Aldxxxxe, Mxxxx|~|| > 123456009999993|~59999999|~2999999|~8253265|~5|~11|~2011-07-11|~15:06:53 > |~Yet more text: which has been split up with carriage returns, line > feeds or possibly both, depending upon your operating system.|~Imxxx, > xxxxxxed|~|| > > > > I've written the following code as a first attempt: > > ifile=r"C:\Documents and Settings\flynns\Desktop\sample-DCLTBCNTH.txt" > ofile=r"C:\Documents and Settings\flynns\Desktop\output-DCLTBCNTH.txt" > > f=open(ifile, mode="rb") > out=open(ofile, mode="w") > line=f.readline() > > while (line) : > if '||' in str(line): > print(str(line), file=out) > else: > print(str(line), end='', file=out) > line=f.readline() > > if __name__ == '__main__': > pass > > > The code attempts to read each line of the input file, and if it > contains a "||", print the line to an output file. If it doesn't contain > a "||" it emits the record without any carriage returns or line feeds > and grabs another line from the input file. > > Whilst the "logic" seems to be working the output file I get out looks > like this: > > b'123456009999990|~52299999|~9999990|~0|~4|~1|~2006-09-08|~13:29:39|~som > e text.|~xxxxxxx, xxxxx|~||\r\n' > b'123456009999991|~52299999|~1999999|~0|~4|~1|~2009-06-05|~15:25:25|~som > e more text|~xxxxx, xxxxxxa|~||\r\n' > b'123456009999992|~51199999|~9999998|~8253265|~5|~11|~2011-07-19|~16:55: > 03|~Some Split test over\r\n'b' serveral\r\n'b' lines\r\n'b'|~Aldxxxxe, > Mxxxx|~||\r\n' > b'123456009999993|~59999999|~2999999|~8253265|~5|~11|~2011-07-11|~15:06: > 53|~Yet more text: \r\n'b' which has been split up with\r\n'b' carriage > returns, line feeds or possibly both, depending upon your operating > system.\r\n'b'|~Imxxx, xxxxxxed|~||\r\n' > > This makes sense to me as I'm writing the file out in text mode and the > \r and \n in the input stream are being interpreted as simple text. > > However, if I try to write the file out in binary mode, I get a > traceback: > > Traceback (most recent call last): > File "C:\Documents and > Settings\flynns\workspace\joinlines\joinlines\joinlines.py", line 10, in > <module> > print(str(line), file=out) > TypeError: 'str' does not support the buffer interface > > > Is there a method of writing out a binary mode file via print() and > making use of the end keyword? > > > If there's not, I presume I'll need to remove the \r\n from "line" in my > else: section and push the amended data out via an out.write(line). How > does one amend bytes in a "line" object
In binary mode the lines you are reading are bytes not str objects. If you want to convert from bytes to str use the decode method. Compare: >>> line = b"whatever" >>> print(str(line)) b'whatever' >>> print(line.decode()) whatever However, I don't see why you have to open your file in binary mode. Something like with open(infile) as instream: with open(outfile, "w") as outstream: for line in instream: if not line.endswith("||\n"): line = line.rstrip("\n") outstream.write(line) should do the right thing. _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor