Re: Python 2.4 vs 2.5 - Unicode error

2009-01-22 Thread Gaurav Veda
On Jan 21, 7:08 pm, John Machin wrote: > > To replace non-ASCII characters in a UTF-8-encoded string by spaces: > | >>> u8 = ' and 25\xc2\xb0F' > | >>> u = u8.decode('utf8') > | >>> ''.join([chr(ord(c)) if c <= u'\x7f' else ' ' for c in u]) > | ' and 25 F' Thanks John for your reply. This is what

Re: Python 2.4 vs 2.5 - Unicode error

2009-01-21 Thread Gaurav Veda
> The 0xc2 strongly suggests that you are feeding the beast data encoded > in UTF-8 while giving it no reason to believe that it is in fact not > encoded in ASCII. Curiously the first errant byte is a long way (4KB) > into your data. Consider doing > print repr(data) > to see what you've actual

Python 2.4 vs 2.5 - Unicode error

2009-01-21 Thread Gaurav Veda
Hi, I am trying to put some webpages into a mysql database using python (after some processing on the text). If I use Python 2.4.2, it works without a fuss. However, on Python 2.5, I get the following error: UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 4357: ordinal not in