On Feb 27, 12:50 pm, Hal Styli <silly...@yahoo.com> wrote: > Hello, > > Can someone please help. > I have a sed solution to the problems below but would like to rewrite > in python... > > I need to strip out some data from a quirky xml file into a csv: > > from something like this > > < ..... cust="dick" .... product="eggs" ... quantity="12" .... > > < .... cust="tom" .... product="milk" ... quantity="2" ...> > < .... cust="harry" .... product="bread" ... quantity="1" ...> > < .... cust="tom" .... product="eggs" ... quantity="6" ...> > < ..... cust="dick" .... product="eggs" ... quantity="6" .... > > > to this > > dick,eggs,12 > tom,milk,2 > harry,bread,1 > tom,eggs,6 > dick,eggs,6 > > I am new to python and xml and it would be great to see some slick > ways of achieving the above by using python's XML capabilities to > parse the original file or python's regex to achive what I did using > sed. > > Thanks for any constructive help given. > > Hal
Here is a sample XML file (I named it data.xml): -------------------------- <orders> <order customer="john" product="eggs" quantity="12" /> <order customer="cindy" product="bread" quantity="1" /> <order customer="larry" product="tea bags" quantity="100" /> <order customer="john" product="butter" quantity="1" /> <order product="chicken" quantity="2" customer="derek" /> </orders> -------------------------- Code: -------------------------- import csv import xml.sax # Handle the XML file with the following structure: # <orders> # <order attributes... /> ... # </orders> class OrdersHandler(xml.sax.handler.ContentHandler): def __init__(self, csvfile): # Open a csv file for output self.csvWriter = csv.writer(open(csvfile, 'w')) def startElement(self, name, attributes): # Only process the <order ... > element if name == 'order': # Construct a sorted list of attribute names in order to # guarantee rows are written in the same order. We assume # the XML elements contain the same attributes attributeNames = attributes.getNames() attributeNames.sort() # Construct a row and write it to the csv file row = [] for name in attributeNames: row.append(attributes.getValue(name)) self.csvWriter.writerow(row) def endDocument(self): # Destroy the csv writer object to close the file self.csvWriter = None # Main datafile = 'data.xml' csvfile = 'data.csv' ordersHandler = OrdersHandler(csvfile) xml.sax.parse(datafile, ordersHandler) -------------------------- To solve your problem, it is easier to use SAX than DOM. Basically, use SAX to scan the XML file, if you encounter the element you like (in this case <order ...>) then you process its attributes. In this case, you sort the attributes, then write to a csv file. -------------------------- References: SAX Parser: http://docs.python.org/library/xml.sax.html SAX Content Handler: http://docs.python.org/library/xml.sax.handler.html Attributes Object: http://docs.python.org/library/xml.sax.reader.html#attributes-objects -- http://mail.python.org/mailman/listinfo/python-list