David Jobes wrote: > I was given a badly or poor formatted xml file that i need to convert to > csv file:
There are no "badly formatted" XML files, only valid and invalid ones. Fortunately following looks like the beginning of a valid one. > <?xml version="1.0"?> > <resultset xmlns:dyn="http://exslt.org/dynamic"> > <table name="SIGNATURE"> > <column name="ID" type="String"> </column> > <column name="NUM" type="Integer"> </column> > <column name="SEVERITY_ID" type="Integer"> </column> > <column name="NAME" type="String"> </column> > <column name="CLASS" type="String"> </column> > <column name="PRODUCT_CATEGORY_ID" type="Integer"> </column> > <column name="PROTOCOL" type="String"> </column> > <column name="TAXONOMY" type="String"> </column> > <column name="CVE_ID" type="String"> </column> > <column name="BUGTRAQ_ID" type="String"> </column> > <column name="DESCRIPTION" type="String"> </column> > <column name="MESSAGE" type="String"> </column> > <column name="FILTERTYPE" type="String"> </column> > <data> > <r> > <c>00000001-0001-0001-0001-000000000027</c> > <c>27</c> > <c>2</c> > <c>0027: IP Options: Record Route (RR)</c> > <c>Network_equip</c> > <c>10</c> > <c>ip</c> > <c>100741885</c> > <c>2001-0752,1999-1339,1999-0986</c> > <c>870</c> > <c></c> > <c></c> > <c></c> > </r> > > > I have been able to load and read the file line by line, XML doesn't have an idea of lines, so don't do that. Instead let a parser make sense of the document structure. > but once i get to > the r line and try to process each c(column) that is where it blows up. I > need to be able to split the lines and place each one or the r (row) on a > single line for the csv. > > i have a list set for each one of the headers based on the col name field, > i just have been able to format properly. Here's a simple script using ElementTree, to introduce you to basic xml handling with Python's stdlib. If you are lucky it might even work ;) import csv import sys from xml.etree import ElementTree SOURCEFILE = "xml_to_csv.xml" tree = ElementTree.parse(SOURCEFILE) table = tree.find("table") column_names = [c.attrib["name"] for c in table.findall("column")] writer = csv.writer(sys.stdout) writer.writerow(column_names) for row in table.find("data").findall("r"): writer.writerow([field.text for field in row.findall("c")]) -- https://mail.python.org/mailman/listinfo/python-list