André wrote:
> I'm using elementtree to process some html files, by building a tree,
> manipulating it, and writing it back. One problem I encounter is that
> elementtree converts some symbols in an unwanted way. For example, the
> symbol ">" is converted to ">". This is fine in html code, but not
> if the page includes some script like the example below
>
> <script type="text/javascript">
> function init() {
> var a = 1;
> if (a > 0)
> {
> alert("Spam alert");
> }
> }
> </script>
>
> The resulting code, with "a > 0" is not understood by the browser...
ET is an XML library, and an XHTML-aware browser has no problems dealing
with that, but I assume you might want to support tag soup parsers like
IE6 as well ;-)
to write true HTML 4.0, you need a HTML serializer. there's a good one
in Kid (though I don't know how hard it would be to use that one with a
preexisting tree, rather than a Kid "event stream").
another alternative is the HTMLTree class in Ian Bicking's commentary
application:
http://svn.pythonpaste.org/Paste/apps/Commentary/trunk/commentary/dumbpath.py
(you may have to tweak that module somewhat to be able to use it without
elementtidy).
</F>
--
http://mail.python.org/mailman/listinfo/python-list