On 2/11/12 18:25:09, Sacha Rook wrote: > I have a problem with a csv file from a supplier, so they export data to csv > however the last column in the record is a description which is marked up > with html. > > trying to automate the processing of this csv to upload elsewhere in a > useable format. If i open the csv with csved it looks like all the records > aren't escaped correctly as after a while i find html tags and text on the > next line/record.
The example line you gave was correctly escaped: the description starts with a double quote, and ends several lines later with another double quote. Double quotes in the HTML are represented by '"'. Maybe csved doesn't recognize this escape convention? > If I 'openwith' excel the description stays on the correct line/record? Excel implements this convention > I want to use python to read these records in and output a valid csv with > the descriptions intact preferably without the html tags so a string of > text formatted with newline/CR where appropriate. How about this: import csv infile = file("input.csv", "rb") outfile = file("output.csv", "wb") reader = csv.reader(infile) writer = csv.writer(outfile) for line in reader: line[-1] = line[-1].replace("\n", " ") print line writer.writerow(line) infile.close() outfile.close() That will replace the newlines inside the HTML, which your csved doesn't seem to recognize, by spaces. When viewed as HTML code, spaces have the same effect as newlines, so this replacement shouldn't alter the meaning of the HTML text. Hope this helps, -- HansM -- http://mail.python.org/mailman/listinfo/python-list