On Jan 21, 7:08 pm, John Machin wrote:
>
> To replace non-ASCII characters in a UTF-8-encoded string by spaces:
> | >>> u8 = ' and 25\xc2\xb0F'
> | >>> u = u8.decode('utf8')
> | >>> ''.join([chr(ord(c)) if c <= u'\x7f' else ' ' for c in u])
> | ' and 25 F'
Thanks John for your reply. This is what
> The 0xc2 strongly suggests that you are feeding the beast data encoded
> in UTF-8 while giving it no reason to believe that it is in fact not
> encoded in ASCII. Curiously the first errant byte is a long way (4KB)
> into your data. Consider doing
> print repr(data)
> to see what you've actual
Hi,
I am trying to put some webpages into a mysql database using python
(after some processing on the text). If I use Python 2.4.2, it works
without a fuss. However, on Python 2.5, I get the following error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position
4357: ordinal not in