> I'm working on a script to download and parse a web page, and it
> includes xml symbol notation, such as ' for the ' character.  Does
> anyone know of a pre-existing python script/lib to convert the xml
> notation back to the actual symbol it represents?

If you have this given in an XML file (rather than an HTML file which
is not well-formed XML), you could use an XML parser for the entire
file. This would automatically unescape character references. Likewise,
you can parse it with HTMLParser, which will invoke the handle_charref
method for these.

If you just want to unescape references, you can use the code in

http://effbot.org/zone/re-sub.htm

HTH,
Martin
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to