Hi I have a problem with a csv file from a supplier, so they export data to csv however the last column in the record is a description which is marked up with html.
trying to automate the processing of this csv to upload elsewhere in a useable format. If i open the csv with csved it looks like all the records aren't escaped correctly as after a while i find html tags and text on the next line/record. If I 'openwith' excel the description stays on the correct line/record? I want to use python to read these records in and output a valid csv with the descriptions intact preferably without the html tags so a string of text formatted with newline/CR where appropriate. So far I have this but don't know where to go from here can someone help me? import csv infile = open('c:\data\input.csv', 'rb') outfile = open('c:\data\output.csv', 'wb') reader = csv.reader(infile) writer = csv.writer(outfile) for line in reader: print line writer.writerow(line) The input.csv is set out as follows; HEADER ROW 1st "FileDate","ProductID","Name","StandardPrice","DropshipPrice","SRP","Brand","Xline","InStock","Stock","Barcode","Weight","CategoryID","Category","SmallImage","LargeImage","Description" A COMPLETE RECORD LOOKS LIKE THIS WITH THE DESCRIPTION FIELD POPULATED SOME RECORDS DON'T HAVE THE DESCRIPTION FIELD POPULATED "2012-11-01T18:28:45.25+00:00","10198","(Venom) PS2 DVD Remote Control (Black)","3.7800","4.3500","12.9800","Venom","true","In Stock","1","5031300025009","200","1339","PC/Games_Console / Playstation / PS2 / Remote Controls","http://www.atssitecentre.co.uk/images/products/10000/10198.gif","http://www.atssitecentre.co.uk/images/products/10000/10198f.jpg","Never have to unplug your joypad / DVD user friendly / Works up to 30 feet from PS/2 / IR wireless technology." THIS IS AN EXAMPLE OF THE BAD RECORD CAUSING PROBLEMS, THE DESCRIPTION FIELD STARTS ""features:</p> AS YOU CAN SEE CONTAINS HTML BUT BECAUSE OF THIS FORMAT SUBSEQUENT HTML TAGS ARE ADDED AS NEW RECORDS ON SUBSEQUENT LINES. "2012-11-01T18:28:45.25+00:00","11116","3.5 inch Disk Drive Lock","2.9500","2.9500","9.9500","None","true","In Stock","3","077511994166","131","1332","PC/Games_Console / PC Security / General","http://www.atssitecentre.co.uk/images/products/11000/11116.gif","http://www.atssitecentre.co.uk/images/products/11000/11116f.jpg","features:</p> <ul> <li>3 1/2" FDD Lock.</li> <li>Die casting housing and cylinder chrome plated.</li> <li>Lock Cover : PBT + GF 15%. (PLASTIC)</li> <li>2 Keys supplied per lock. <br /></li> </ul>" I know I am far from complete but don't know how to proceed :-) As I said I want to reconstruct a clean record either strip out the html tags or at least escape the records appropriately.. Thanks all -- http://mail.python.org/mailman/listinfo/python-list