On Wed, 2014-03-12 at 19:04 -0700, Ivo Truxa wrote: > Your message is a few months old, but I see no answer, and stumbled upon it > when writing an enhanced version of the normalize_charset feature, so > thought that I could perhaps help.
Thanks! I'm glad to hear of your experiences. > [R]egardless whether > you use normalizing or not, as long as you need to match non-ASCII patterns, > you need to write rules also in Unicode anyway, because you cannot reject > Unicode messages. Indeed! And even if you only want to accept messages in English (or some other ASCII-supported language), nowadays it's not at all uncommon for messages to have dingbats or printer's quotation marks in them -- or one of your correspondents might be sitting at a relative's computer or in an internet cafe somewhere and the subject line might get the Chinese equivalent of "Re:" prepended to it, or the body might have a disclaimer in French appended. > Another possibility may be normalizing, instead to UTF, to plain 7bit > US-ASCII. The currently proposed patch for ASCII normalizing transliterates > also non-Latin alphabets. The patch was proposed to the dev list, so > impatient and courageous users might want to try it on a non-production > server, but be warned that it is not any official code (at least not now), > and currently very little tested. Interesting idea! I searched in the spamassassin-dev archives but I don't think I found the right patch; could you point me at it? How do you handle non-alphabetic scripts (like CJK, where a character may have multiple pronunciations both within and between languages)? Seems like just normalizing them to U+NNNN might be better than trying to transcribe them. (And that would let a brave or foolhardy mail administrator write rules to match patterns seen in, say, Chinese-language spam even without knowing Chinese, or even without knowing what language the spam was in.) Anyway, glad to hear that normalize_charset hasn't been causing you problems, and for us, normalizing to UTF8 is almost certainly what we want if it's reasonably safe. Jay -- Jay Sekora Linux system administrator and postmaster, The Infrastructure Group MIT Computer Science and Artificial Intelligence Laboratory