Hello, 

Your message is a few months old, but I see no answer, and stumbled upon it
when writing an enhanced version of the normalize_charset feature, so
thought that I could perhaps help.


Jay Sekora wrote
> Hi.  We're running SpamAssassin 3.3.1, and pursuant to some advice I've 
> seen in archives of this list and spamassassin-dev, I am *not* 
> using normalize_charset.

I do not know much about the original bug, but until recently I used Unicode
normalizing without observing any problems. Perhaps I was lucky, or did not
look close enough. However, that's irrelevant, because regardless whether
you use normalizing or not, as long as you need to match non-ASCII patterns,
you need to write rules also in Unicode anyway, because you cannot reject
Unicode messages. So when you disable the normalizing, you only make your
case worse. Not only you have to write rules in UTF8 anyway (hence risking
that they'll be slow), but in plus you need to write the rules also for any
possible characters set that can arrive (and you wrote your server needs to
accept email in all possible languages, so there would be dozens of
different character sets). That's an unhuman task, and the number of rules
or their complexity would slow down your server possibly more than the bug
(if it still exists).

On my mind, anyone who needs to write rules for a multi-national server and
for Asian languages, cannot go around the normalizing. Or he has to stick
with mostly only ASCII rules (which are not much useful for Asian
languages).

Another possibility may be normalizing, instead to UTF, to plain 7bit
US-ASCII. The currently proposed patch for ASCII normalizing transliterates
also non-Latin alphabets. The patch was proposed to the dev list, so
impatient and courageous users might want to try it on a non-production
server, but be warned that it is not any official code (at least not now),
and currently very little tested.

Ivo





--
View this message in context: 
http://spamassassin.1065346.n5.nabble.com/Current-best-practices-around-normalize-charset-tp105840p108513.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.

Reply via email to