On Friday 07 December 2007 20:42, Karsten Bräckelmann wrote: > On Fri, 2007-12-07 at 08:38 -0500, Matt Kettler wrote: > > Stefan Jakobs wrote: > > > Let's assume you running a mailrelay for a university and your users > > > are from different countries. Lets assume further on you have no > > > Swedish people at your university (and you get a lot of spam from > > > Sweden). Then it would be nice to have a not_ok_locales option, because > > > you see immediately which locale character set is considered as > > > possible spam. > > Now let's further assume, your students are able to speak English. And > they are collaborating with an Open Source project, discussing with a > lot of people from all over the world. > > Let's assume, one of them happens to be Swedish. And even though the > entire communication is English, that ignorant bastard dares to have his > real name at the bottom of his mail -- which includes Swedish chars. > > Do you hear that flushing sound of catching spam?
Do you mean: If I have one false positive I should throw my spam filter in a trash can? Of course, can it happen that a mail is catched by rules which were not made for it. Especially at Universities were you have a great range of different types of mails. > Swedish chars are a superset of English chars. As are German and many > others. To see that this is not an artificial, made up example please > have a look at my real name. :) Ok. My fault I mistook charsets with country codes. But replace se with ru or ch or greek7. The result is the same. You want one charset to be considered as "not ham" and you have to give the whole list to the parameter. And I think it is a long and ugly to read list (see: http://www.iana.org/assignments/character-sets) I only want to say that there can be a situation in which you only know that you don't want to consider the XXX charset as an indicator for ham. > > Now that sounds like a valid reason to me. > > It doesn't to me... > > Anyway, this whole example is non-realistic as is. As Matt pointed out > in a later post, we are talking character sets here, not languages. In > the world of ok_locales, there is no distinction between en and se, > which is just en to ok_locales... As I say I got confused with it (and be it maybe still). > > guenther Other question: How does Spamassassin know which charset it should use. Provides it a list of all charsets and compares or does it try it to find the information in the header of the mail or ...? Greetings Stefan
pgpS7gggTHdfF.pgp
Description: PGP signature