André wrote: > I'm using elementtree to process some html files, by building a tree, > manipulating it, and writing it back. One problem I encounter is that > elementtree converts some symbols in an unwanted way. For example, the > symbol ">" is converted to ">". This is fine in html code, but not > if the page includes some script like the example below > > <script type="text/javascript"> > function init() { > var a = 1; > if (a > 0) > { > alert("Spam alert"); > } > } > </script> > > The resulting code, with "a > 0" is not understood by the browser...
ET is an XML library, and an XHTML-aware browser has no problems dealing with that, but I assume you might want to support tag soup parsers like IE6 as well ;-) to write true HTML 4.0, you need a HTML serializer. there's a good one in Kid (though I don't know how hard it would be to use that one with a preexisting tree, rather than a Kid "event stream"). another alternative is the HTMLTree class in Ian Bicking's commentary application: http://svn.pythonpaste.org/Paste/apps/Commentary/trunk/commentary/dumbpath.py (you may have to tweak that module somewhat to be able to use it without elementtidy). </F> -- http://mail.python.org/mailman/listinfo/python-list