John Nagle wrote: > I'm trying to clean up a bad ASCII string, one read from a > web page that is supposedly in the ASCII character set but has some > characters above 127. And I get this: > > File "D:\projects\sitetruth\InfoSitePage.py", line 285, in httpfetch > sitetext = sitetext.encode('ascii','replace') # force to clean ASCII > > UnicodeDecodeError: 'ascii' codec can't decode byte 0x92 in position 29151: > ordinal not in range(128) > > Why is that exception being raised when the codec was told 'replace'?
The .encode('ascii') takes unicode strings to str strings. Since you gave it a str string, it first tried to convert it to a unicode string using the default codec ('ascii'), just as if you were to have done unicode(sitetext).encode('ascii', 'replace'). I think you want something like this: sitetext = sitetext.decode('ascii', 'replace').encode('ascii', 'replace') -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco -- http://mail.python.org/mailman/listinfo/python-list