On 28 Aug 2006 13:51:58 -0700, [EMAIL PROTECTED] wrote:
>Fredrik Lundh wrote:
>> 3) convert the data to Unicode before passing it to the database
>> interface, and leave it to the interface to convert it to whatever
>> encoding your database uses:
>>
>>      data = ... get encoded string from email ...
>>      text = data.decode("iso-8859-1")
>>      ... write text to database ...
>
>Wouldn't that have to assume that all incoming data is in iso-8859-1?
>If someone sends me an email with chinese characters would that still
>work (I don't know the character set at data insert time)?
>

Yes.  All byte streams are valid ISO-8859-1.  For clarity, you may want
to use the codec name "charmap" instead.  It is identical to ISO-8895-1,
but implies no actual encoding to someone reading the source code.

Another solution which might be better is to select a data type for this
column which can handle arbitrary bytes.  This will let you avoid mangling
the input completely.  Different databases have different column types
for handling this.  For PostreSQL, you might want to look at BYTEA.

Jean-Paul
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to