Re: Decode email subjects into unicode

Laszlo Nagy Tue, 18 Mar 2008 03:27:23 -0700

Sorry, meanwhile i found that "email.Headers.decode_header" can be used 
to convert the subject into unicode:


> def decode_header(self,headervalue):
> val,encoding = decode_header(headervalue)[0]
> if encoding:
> return val.decode(encoding)
> else:
> return val

However, there are malformed emails and I have to put them into the 
database. What should I do with this:


Return-Path: <[EMAIL PROTECTED]>
X-Original-To: [EMAIL PROTECTED]
Delivered-To: [EMAIL PROTECTED]
Received: from 195.228.74.135 (unknown [122.46.173.89])
by shopzeus.com (Postfix) with SMTP id F1C071DD438;
Tue, 18 Mar 2008 05:43:27 -0400 (EDT)
Date: Tue, 18 Mar 2008 12:43:45 +0200
Message-ID: <[EMAIL PROTECTED]>
From: "Euro Dice Casino" <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Subject: With 2500 Euro of Welcome Bonus you cant miss the chance!
MIME-Version: 1.0
Content-Type: text/html; charset=iso-8859-1
Content-Transfer-Encoding: 7bit



There is no encoding given in the subject but it contains 0x92. When I 
try to insert this into the database, I get:

ProgrammingError: invalid byte sequence for encoding "UTF8": 0x92

All right, this probably was a spam email and I should simply discard 
it. Probably the spammer used this special character in order to prevent 
mail filters detecting "can't" and "2500". But I guess there will be 
other important (ham) emails with bad encodings. How should I handle this?

Thanks,

Laszlo

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Decode email subjects into unicode

Reply via email to