BerlinBrown wrote:
> With this code, ignore/replace still generate an error
> 
>                       # Encode to simple ascii format.
>                       field.full_content = field.full_content.encode('ascii', 
> 'replace')
> 
> Error:
> 
> [0/1] 'ascii' codec can't decode byte 0xe2 in position 14317: ordinal
> not in ran
> ge(128)
> 
> The document in question; is a wikipedia document.  I believe they use
> latin-1 unicode or something similar.  I thought replace and ignore
> were supposed to replace and ignore?

Is field.full_content a str or a unicode? You probably haven't decoded
it from a byte string yet.

>>> field.full_content = field.full_content.decode('utf8', 'replace')
>>> field.full_content = field.full_content.encode('ascii', 'replace')

Why do you want to use ASCII? UTF-8 is great. :-)
-- 
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to