On Sun, Feb 19, 2006 at 11:05:49PM -0500, Jeff Breidenbach wrote: > > The new mhonarc config should do charset conversion if possible, > > or just output the text as-is in the case charset of the mail is utf8 > > or unknown. > > It's not that simple. Leaving a '<' character can cause security > issues. Anyway, the relevant portion of the mhonarc manual is the > <CHARSETCONVERTERS> resource. Take a look at mhonac::htmlize > versus MHonArc::CharEnt::str2sgml and possibly discuss this on the > upstream mailing list. > > http://www.mhonarc.org/MHonArc/doc/resources/charsetconverters.html > > However, I personally recommend that mhonarc be set to convert > everything to UTF-8, no exceptions. That simplifies a lot of things, > including the use of mixed languages in a single message. Mixed > language index pages. Easier linguistic analysis and data mining > of the HTML. Etc. Bending over backwards for incorrectly labelled > character sets on inbound email seems more trouble than it is worth.
Thanks. I did look into the manual you quote above, but it didn't work for me. Now I suddenly got it to work, apparantly if you don't specify 'override' but specify default, the default you supply is simply ignored... *sigh* In addition, the whole mhonarc resource file is ignored without any error message emitted if you specify <lang>en_GB.UTF-8</lang> *doublesigh*. It is not helpful at all that the documenation about charsetconverters doesn't even mention the existance of the 'override' parameter and what it means, it's that I happened to stumble upon it in an example. </rant> The only problem I now have is that I want to convince mhonarc that if no charset is specified, it should assume UTF-8, within Debian context with our utf8 changelogs etc, that's a saner decision than assuming latin1, which is what MHonArc::UTF8::to_utf8 apparantly does, totally ignoring the current locale. The RFC's say one should assuming us-ascii, which is simply undefined for non-7bit, so it isn't wrong to assuming utf8 then. If anyone has a tip, please let me know, otherwise, it might be easier to have a different workaround: adding this header when there's not Content-Type header at all in a mail. > Incidentally, I was probably put on CC: because I'm the mhonarc package > maintainer. But I should also mention that one of my other hats is > helping run mail-archive.com, which provides secondary archival service > for all Debian mailing lists, with permission of the (former) DPL. The service > is also available for any other Debian team or group currently wrestling > with mhonarc configuration. So that is a possible fallback if needed. The context here is the web part of the PTS, which actually isn't a mailinglist. But thanks for the offer, and your help! --Jeroen -- Jeroen van Wolffelaar [EMAIL PROTECTED] (also for Jabber & MSN; ICQ: 33944357) http://Jeroen.A-Eskwadraat.nl -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

