Hello, 2010/2/28 Stefan Behnel <stefan...@behnel.de>
> Hal Styli, 27.02.2010 21:50: > > I have a sed solution to the problems below but would like to rewrite > > in python... > > Note that sed (or any other line based or text based tool) is not a > sensible way to handle XML. If you want to read XML, use an XML parser. > They are designed to do exactly what you want in a standard compliant way, > and they can deal with all sorts of XML formatting and encoding, for > example. > > > > I need to strip out some data from a quirky xml file into a csv: > > > > from something like this > > > > < ..... cust="dick" .... product="eggs" ... quantity="12" .... > > > < .... cust="tom" .... product="milk" ... quantity="2" ...> > > < .... cust="harry" .... product="bread" ... quantity="1" ...> > > < .... cust="tom" .... product="eggs" ... quantity="6" ...> > > < ..... cust="dick" .... product="eggs" ... quantity="6" .... > > > As others have noted, this doesn't tell much about your XML. A more > complete example would be helpful. > > > > to this > > > > dick,eggs,12 > > tom,milk,2 > > harry,bread,1 > > tom,eggs,6 > > dick,eggs,6 > > > > I am new to python and xml and it would be great to see some slick > > ways of achieving the above by using python's XML capabilities to > > parse the original file or python's regex to achive what I did using > > sed. > > another solution in this case could be to use an XSLT stylesheet. That way the input processing is defined in an XSLT stylesheet. The stylesheet is test.xsl and the insput data test.xml. The following Python code the applies the stylesheet on the input data and puts the output into foo. Python code: #!/usr/bin/python import sys import libxml2 import libxslt styledoc = libxml2.parseFile("test.xsl") style = libxslt.parseStylesheetDoc(styledoc) doc = libxml2.parseFile("test.xml") result = style.applyStylesheet(doc, None) style.saveResultToFilename("foo", result, 0) BR, Roland *Example run in Linux:* rol...@komputer:~/Desktop/XML/XSLT$ ./xslt_test.py rol...@komputer:~/Desktop/XML/XSLT$ cat foo john,eggs,12 cindy,bread,1 larry,tea bags,100 john,butter,1 derek,chicken,2 derek,milk,2 * The test.xsl stylesheet:* <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format" version="1.0"> <!-- text output because we want to have an CSV file --> <xsl:output method="text"/> <!-- remove all whitespace coming with input XML --> <xsl:strip-space elements="*"/> <!-- matches any <order> element and extracts the customer,product&quantity attributes --> <xsl:template match="order"> <xsl:value-of select="@customer"/> <xsl:text>,</xsl:text> <xsl:value-of select="@product"/> <xsl:text>,</xsl:text> <xsl:value-of select="@quantity"/> <xsl:text> </xsl:text> </xsl:template> </xsl:stylesheet>
-- http://mail.python.org/mailman/listinfo/python-list