Handling some isolated iso-8859-1 characters

Daniel Mahoney Tue, 03 Jun 2008 11:41:08 -0700

I'm working on an app that's processing Usenet messages. I'm making a
connection to my NNTP feed and grabbing the headers for the groups I'm
interested in, saving the info to disk, and doing some post-processing.
I'm finding a few bizarre characters and I'm not sure how to handle them
pythonically.


One of the lines I'm finding this problem with contains:
137050  Cleo and I have an anouncement!   "Mlle. =?iso-8859-1?Q?Ana=EFs?="
<[EMAIL PROTECTED]>  Sun, 21 Nov 2004 16:21:50 -0500
<[EMAIL PROTECTED]>              4478    69 Xref:
sn-us rec.pets.cats.community:137050

The interesting patch is the string that reads "=?iso-8859-1?Q?Ana=EFs?=".
An HTML rendering of what this string should look would be "Ana&iuml;s".

What I'm doing now is a brute-force substitution from the version in the
file to the HTML version. That's ugly. What's a better way to translate
that string? Or is my problem that I'm grabbing the headers from the NNTP
server incorrectly?



--
http://mail.python.org/mailman/listinfo/python-list

Handling some isolated iso-8859-1 characters

Reply via email to