On Fri, 2004-12-10 at 08:36, harrelson wrote: > I have a list of about 2500 html escape sequences (decimal) that I need > to convert to utf-8. Stuff like:
I'm pretty sure this somewhat horrifying code does it, but is probably an example of what not to do: >>> escapeseq = '비' >>> uescape = ("\\u%x" % int(escapeseq[2:-1])).decode("unicode_escape") >>> uescape u'\ube44' >>> print uescape 비 (I don't seem to have the font for it, but I think that's right - my terminal font seems to show it correctly). I just get the decimal value of the escape, format it as a Python unicode hex escape sequence, and tell Python to interpret it as an escaped unicode string. >>> entities = ['비', '행', '기', '로', '보', '낼', '거', '에', '요', '내', '면', '금', '이', '얼', '마', '지', '잠'] >>> def unescape(escapeseq): ... return ("\\u%x" % int(escapeseq[2:-1])).decode("unicode_escape") ... >>> print ' '.join([ unescape(x) for x in entities ]) 비 행 기 로 보 낼 거 에 요 내 면 금 이 얼 마 지 잠 -- Craig Ringer -- http://mail.python.org/mailman/listinfo/python-list