On Feb 28, 12:05 am, Stefan Behnel <stefan...@behnel.de> wrote: > Hal Styli, 27.02.2010 21:50: > > > I have a sed solution to the problems below but would like to rewrite > > in python... > > Note that sed (or any other line based or text based tool) is not a > sensible way to handle XML. If you want to read XML, use an XML parser. > They are designed to do exactly what you want in a standard compliant way, > and they can deal with all sorts of XML formatting and encoding, for example. > > > I need to strip out some data from a quirky xml file into a csv: > > > from something like this > > > < ..... cust="dick" .... product="eggs" ... quantity="12" .... > > > < .... cust="tom" .... product="milk" ... quantity="2" ...> > > < .... cust="harry" .... product="bread" ... quantity="1" ...> > > < .... cust="tom" .... product="eggs" ... quantity="6" ...> > > < ..... cust="dick" .... product="eggs" ... quantity="6" .... > > > As others have noted, this doesn't tell much about your XML. A more > complete example would be helpful. > > > to this > > > dick,eggs,12 > > tom,milk,2 > > harry,bread,1 > > tom,eggs,6 > > dick,eggs,6 > > > I am new to python and xml and it would be great to see some slick > > ways of achieving the above by using python's XML capabilities to > > parse the original file or python's regex to achive what I did using > > sed. > > It's funny how often people still think that SAX is a good way to solve XML > problems. Here's an untested solution that uses xml.etree.ElementTree: > > from xml.etree import ElementTree as ET > > csv_field_order = ['cust', 'product', 'quantity'] > > clean_up_used_elements = None > for event, element in ET.iterparse("thefile.xml", events=['start']): > # you may want to select a specific element.tag here > > # format and print the CSV line to the standard output > print(','.join(element.attrib.get(title, '') > for title in csv_field_order)) > > # safe some memory (in case the XML file is very large) > if clean_up_used_elements is None: > # this assigns the clear() method of the root (first) element > clean_up_used_elements = element.clear > clean_up_used_elements() > > You can strip everything dealing with 'clean_up_used_elements' (basically > the last section) if your XML file is small enough to fit into memory (a > couple of MB is usually fine). > > Stefan
This solution is so beautiful and elegant. Thank you. Now I am off to learn ElementTree. By the way, Stefan, I am using Python 2.6. Do you know the differences between ElementTree and cElementTree? -- http://mail.python.org/mailman/listinfo/python-list