Dave wrote: > How can I translate this: > > gi > > to this: > > "gi"
the easiest way is to run it through an HTML or XML parser (depending on what the source is). or you could use something like this: import re def fix_charrefs(text): def fixup(m): text = m.group(0) try: if text[:3] == "&#x": return unichr(int(text[3:-1], 16)) else: return unichr(int(text[2:-1])) except ValueError: pass return text # leave as is return re.sub("&#?\w+;", fixup, text) >>> fix_charrefs("gi") 'gi' also see: http://effbot.org/zone/re-sub.htm#strip-html > I've tried urllib.unencode and it doesn't work. those are HTML/XML character references, not encoded URL characters. </F> -- http://mail.python.org/mailman/listinfo/python-list