Adrian Smith wrote: > Can anyone tell me how to get rid of smart quotes in html using > Python? I've tried variations on > stuff = string.replace(stuff, "\“", "\""), but to no avail, presumably > because they're not standard ASCII.
Convert the string to unicode. For that you have to know its encoding. I assume UTF-8: >>> s = "a “smart quote” example" >>> u = s.decode("utf-8") Now you can replace the quotes (I looked up the codes in wikipedia): >>> u.replace(u"\u201c", "").replace(u"\u201d", "") u'a smart quote example' Alternatively, if you have many characters to remove translate() is more efficient: >>> u.translate(dict.fromkeys([0x201c, 0x201d, 0x2018, 0x2019])) u'a smart quote example' If necessary convert the result back to the original encoding: >>> clean = u.translate(dict.fromkeys([0x201c, 0x201d, 0x2018, 0x2019])) >>> clean.encode("utf-8") 'a smart quote example' Peter -- http://mail.python.org/mailman/listinfo/python-list