On Aug 26, 4:13 pm, Peter Otten <[EMAIL PROTECTED]> wrote: > Adrian Smith wrote: > > Can anyone tell me how to get rid of smart quotes in html using > > Python? I've tried variations on > > stuff = string.replace(stuff, "\“", "\""), but to no avail, presumably > > because they're not standard ASCII. > > Convert the string to unicode. For that you have to know its encoding. I > assume UTF-8: > > >>> s = "a “smart quote” example" > >>> u = s.decode("utf-8") > > Now you can replace the quotes (I looked up the codes in wikipedia): > > >>> u.replace(u"\u201c", "").replace(u"\u201d", "") > > u'a smart quote example' > > Alternatively, if you have many characters to remove translate() is more > efficient: > > >>> u.translate(dict.fromkeys([0x201c, 0x201d, 0x2018, 0x2019])) > > u'a smart quote example' > > If necessary convert the result back to the original encoding: > > >>> clean = u.translate(dict.fromkeys([0x201c, 0x201d, 0x2018, 0x2019])) > >>> clean.encode("utf-8") > > 'a smart quote example' > > Peter
Brilliant, thanks! -- http://mail.python.org/mailman/listinfo/python-list