On 2012-11-02, Sacha Rook <sachar...@gmail.com> wrote: > Hi > > I have a problem with a csv file from a supplier, so they > export data to csv however the last column in the record is a > description which is marked up with html. > > trying to automate the processing of this csv to upload > elsewhere in a useable format. If i open the csv with csved it > looks like all the records aren't escaped correctly as after a > while i find html tags and text on the next line/record.
Maybe compose a simple parter to disambiguate the lines from the file. Something like (you'll have to write is_html, and my Python 2 is mighty rusty, you'll have to fix up. Note that infile doesn't have to be in binary mode with this scheme, but it would fail on bizarre newlines in the file): def parse_records(iter): for line in iter: if is_html(line): yield ('html', line) else: yield ('csv', csv.reader([line.strip()]).next()) infile = open('c:\data\input.csv') outfile = open('c:\data\output.csv', 'wb') writer = csv.writer(outfile) for tag, rec in parse_record(infile): if tag == 'html': print rec elif tag == 'csv': writer.writerow(rec) else: raise ValueError("Unknown record type %s" % tag) -- Neil Cerutti -- http://mail.python.org/mailman/listinfo/python-list