Hi, How does your code deal with ' like entities?
Thanks, Ray Klaus Alexander Seistrup wrote: > Rares Vernica wrote: > >> How can I unescape HTML entities like " "? >> >> I know about xml.sax.saxutils.unescape() but it only deals with >> "&", "<", and ">". >> >> Also, I know about htmlentitydefs.entitydefs, but not only this >> dictionary is the opposite of what I need, it does not have >> " ". > > How about something like: > > #v+ > #!/usr/bin/env/python > '''dehtml.py''' > > import re > import htmlentitydef > > myrx = re.compile('&(' + '|'.join(htmlentitydefs.name2codepoint.keys()) + > ');') > > def dehtml(s): > return re.sub( > myrx, > lambda m: unichr(htmlentitydefs.name2codepoint[m.group(1)]), > s > ) > # end def dehtml > > if __name__ == '__main__': > import sys > print dehtml(sys.stdin.read()).encode('utf-8') > # end if > > #v- > > E.g.: > > #v+ > > $ echo 'frække frølår' | ./dehtml.py > frække frølår > $ > > #v- > -- http://mail.python.org/mailman/listinfo/python-list