André wrote:

> I'm using elementtree to process some html files, by building a tree,
> manipulating it, and writing it back.  One problem I encounter is that
> elementtree converts some symbols in an unwanted way.  For example, the
> symbol ">" is converted to ">".  This is fine in html code, but not
> if the page includes some script like the example below
> 
> <script type="text/javascript">
> function init() {
> var a = 1;
> if (a > 0)
>    {
>    alert("Spam alert");
>    }
> }
> </script>
> 
> The resulting code, with "a &gt; 0" is not understood by the browser...

ET is an XML library, and an XHTML-aware browser has no problems dealing 
with that, but I assume you might want to support tag soup parsers like 
IE6 as well ;-)

to write true HTML 4.0, you need a HTML serializer.  there's a good one 
in Kid (though I don't know how hard it would be to use that one with a 
preexisting tree, rather than a Kid "event stream").

another alternative is the HTMLTree class in Ian Bicking's commentary
application:

http://svn.pythonpaste.org/Paste/apps/Commentary/trunk/commentary/dumbpath.py

(you may have to tweak that module somewhat to be able to use it without 
elementtidy).

</F>

-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to