Re: Python and encodings drives me crazy

John Machin Mon, 20 Jun 2005 16:41:19 -0700

Oliver Andrich wrote:
> 2005/6/21, Konstantin Veretennicov <[EMAIL PROTECTED]>:
> 
>>It does, as long as headline and caption *can* actually be encoded as
>>macroman. After you decode headline from utf-8 it will be unicode and
>>not all unicode characters can be mapped to macroman:
>>
>>
>>>>>u'\u0160'.encode('utf8')
>>
>>'\xc5\xa0'
>>
>>>>>u'\u0160'.encode('latin2')
>>
>>'\xa9'
>>
>>>>>u'\u0160'.encode('macroman')
>>
>>Traceback (most recent call last):
>>  File "<stdin>", line 1, in ?
>>  File "D:\python\2.4\lib\encodings\mac_roman.py", line 18, in encode
>>    return codecs.charmap_encode(input,errors,encoding_map)
>>UnicodeEncodeError: 'charmap' codec can't encode character u'\u0160' in 
>>position
>> 0: character maps to <undefined>
> 
> 
> Yes, this and the coersion problems Diez mentioned were the problems I
> faced. Now I have written a little cleanup method, that removes the
> bad characters from the input


By "bad characters", do you mean characters that are in Unicode but not 
in MacRoman?

By "removes the bad characters", do you mean "deletes", or do you mean 
"substitutes one or more MacRoman characters"?

If all you want to do is torch the bad guys, you don't have to write "a 
little cleanup method".

To leave a tombstone for the bad guys:

 >>> u'abc\u0160def'.encode('macroman', 'replace')
'abc?def'
 >>>

To leave no memorial, only a cognitive gap:

 >>> u'The Good Soldier \u0160vejk'.encode('macroman', 'ignore')
'The Good Soldier vejk'

Do you *really* need to encode it as MacRoman? Can't the Mac app 
understand utf8?

You mentioned cp850 in an earlier post. What would you be feeding 
cp850-encoded data that doesn't understand cp1252, and isn't in a museum?

Cheers,
John
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Python and encodings drives me crazy

Reply via email to