> I'm working on a script to download and parse a web page, and it > includes xml symbol notation, such as ' for the ' character. Does > anyone know of a pre-existing python script/lib to convert the xml > notation back to the actual symbol it represents?
If you have this given in an XML file (rather than an HTML file which is not well-formed XML), you could use an XML parser for the entire file. This would automatically unescape character references. Likewise, you can parse it with HTMLParser, which will invoke the handle_charref method for these. If you just want to unescape references, you can use the code in http://effbot.org/zone/re-sub.htm HTH, Martin -- http://mail.python.org/mailman/listinfo/python-list