Re: encode/decode misunderstanding

Tim Arnold Mon, 30 Jul 2007 07:32:38 -0700

"Diez B. Roggisch" <[EMAIL PROTECTED]> wrote in message 
news:[EMAIL PROTECTED]
> Tim Arnold schrieb:
>> Hi, I'm beginning to understand the encode/decode string methods, but I'd 
>> like confirmation that I'm still thinking in the right direction:
>>
>> I have a file of latin1 encoded text. Let's say I put one line of that 
>> file into a string variable 'tocline', as follows:
>> tocline = 'Ficha Datos de p\xe9rdida AND acci\xf3n'
>>
>> import codecs
>> tocFile = codecs.open('mytoc.htm','wb',encoding='utf8',errors='replace')
>> tocline = tocline.decode('latin1','replace')
>> tocFile.write(tocline)
>> tocFile.close()
>>
>> What I think is that tocFile is wrapped to insure that anything written 
>> to it is in utf8
>> I decode the latin1 string into python's internal unicode encoding and 
>> that gets written out as utf8.
>>
>> Questions:
>> what exactly is the tocline when it's read in with that \xe9 and \xed in 
>> the string? A latin1 encoded string?
>
> Yes. A simple, pure byte-string, that happens to contain bytes which under 
> the latin1-encoding are "correct".
>
>> Is my method the right way to write such a line out to a file with utf8 
>> encoding?
>
> Yes.
>
>> If I read in the latin1 file using
>> codecs.open(filename,encoding='latin1') and write out the utf8 file by 
>> opening with
>> codecs.open(othername,encoding='utf8'), would I no longer have a 
>> problem --  I could just read in latin1 and write out utf8 with no more 
>> worries about encoding?
>
> As long as you don't mix bytestrings and only use unicode-objects, you 
> should be fine, yes.
>
> Diez


wow, I was thinking correctly about encoding! time for a beer!
Diez, thanks very much for confirming my thoughts.

--Tim Arnold 


-- 
http://mail.python.org/mailman/listinfo/python-list

Re: encode/decode misunderstanding

Reply via email to