Sorry, meanwhile i found that "email.Headers.decode_header" can be used to convert the subject into unicode:
> def decode_header(self,headervalue): > val,encoding = decode_header(headervalue)[0] > if encoding: > return val.decode(encoding) > else: > return val However, there are malformed emails and I have to put them into the database. What should I do with this: Return-Path: <[EMAIL PROTECTED]> X-Original-To: [EMAIL PROTECTED] Delivered-To: [EMAIL PROTECTED] Received: from 195.228.74.135 (unknown [122.46.173.89]) by shopzeus.com (Postfix) with SMTP id F1C071DD438; Tue, 18 Mar 2008 05:43:27 -0400 (EDT) Date: Tue, 18 Mar 2008 12:43:45 +0200 Message-ID: <[EMAIL PROTECTED]> From: "Euro Dice Casino" <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Subject: With 2500 Euro of Welcome Bonus you cant miss the chance! MIME-Version: 1.0 Content-Type: text/html; charset=iso-8859-1 Content-Transfer-Encoding: 7bit There is no encoding given in the subject but it contains 0x92. When I try to insert this into the database, I get: ProgrammingError: invalid byte sequence for encoding "UTF8": 0x92 All right, this probably was a spam email and I should simply discard it. Probably the spammer used this special character in order to prevent mail filters detecting "can't" and "2500". But I guess there will be other important (ham) emails with bad encodings. How should I handle this? Thanks, Laszlo -- http://mail.python.org/mailman/listinfo/python-list