On Friday, October 10, 2014 8:21:17 AM UTC-4, Peter Otten wrote: > David Jobes wrote: > > > > > I was given a badly or poor formatted xml file that i need to convert to > > > csv file: > > > > There are no "badly formatted" XML files, only valid and invalid ones. > > Fortunately following looks like the beginning of a valid one. > > > > > <?xml version="1.0"?> > > > <resultset xmlns:dyn="http://exslt.org/dynamic"> > > > <table name="SIGNATURE"> > > > <column name="ID" type="String"> </column> > > > <column name="NUM" type="Integer"> </column> > > > <column name="SEVERITY_ID" type="Integer"> </column> > > > <column name="NAME" type="String"> </column> > > > <column name="CLASS" type="String"> </column> > > > <column name="PRODUCT_CATEGORY_ID" type="Integer"> </column> > > > <column name="PROTOCOL" type="String"> </column> > > > <column name="TAXONOMY" type="String"> </column> > > > <column name="CVE_ID" type="String"> </column> > > > <column name="BUGTRAQ_ID" type="String"> </column> > > > <column name="DESCRIPTION" type="String"> </column> > > > <column name="MESSAGE" type="String"> </column> > > > <column name="FILTERTYPE" type="String"> </column> > > > <data> > > > <r> > > > <c>00000001-0001-0001-0001-000000000027</c> > > > <c>27</c> > > > <c>2</c> > > > <c>0027: IP Options: Record Route (RR)</c> > > > <c>Network_equip</c> > > > <c>10</c> > > > <c>ip</c> > > > <c>100741885</c> > > > <c>2001-0752,1999-1339,1999-0986</c> > > > <c>870</c> > > > <c></c> > > > <c></c> > > > <c></c> > > > </r> > > > > > > > > > I have been able to load and read the file line by line, > > > > XML doesn't have an idea of lines, so don't do that. Instead let a parser > > make sense of the document structure. > > > > > but once i get to > > > the r line and try to process each c(column) that is where it blows up. I > > > need to be able to split the lines and place each one or the r (row) on a > > > single line for the csv. > > > > > > i have a list set for each one of the headers based on the col name field, > > > i just have been able to format properly. > > > > Here's a simple script using ElementTree, to introduce you to basic xml > > handling with Python's stdlib. If you are lucky it might even work ;) > > > > import csv > > import sys > > from xml.etree import ElementTree > > > > SOURCEFILE = "xml_to_csv.xml" > > > > tree = ElementTree.parse(SOURCEFILE) > > table = tree.find("table") > > column_names = [c.attrib["name"] for c in table.findall("column")] > > writer = csv.writer(sys.stdout) > > writer.writerow(column_names) > > for row in table.find("data").findall("r"): > > writer.writerow([field.text for field in row.findall("c")])
That did it, thank you, and in a lot fewer lines of code than i had, i was trying to use strings and regex. i will read up more on the xml.etree stuff. -- https://mail.python.org/mailman/listinfo/python-list