On Friday 07 December 2007 20:42, Karsten Bräckelmann wrote:
> On Fri, 2007-12-07 at 08:38 -0500, Matt Kettler wrote:
> > Stefan Jakobs wrote:
> > > Let's assume you running a mailrelay for a university and your users
> > > are from different countries. Lets assume further on you have no
> > > Swedish people at your university (and you get a lot of spam from
> > > Sweden). Then it would be nice to have a not_ok_locales option, because
> > > you see immediately which locale character set is considered as
> > > possible spam.
>
> Now let's further assume, your students are able to speak English. And
> they are collaborating with an Open Source project, discussing with a
> lot of people from all over the world.
>
> Let's assume, one of them happens to be Swedish. And even though the
> entire communication is English, that ignorant bastard dares to have his
> real name at the bottom of his mail -- which includes Swedish chars.
>
> Do you hear that flushing sound of catching spam?

Do you mean: If I have one false positive I should throw my spam filter in a 
trash can? Of course, can it happen that a mail is catched by rules which 
were not made for it. Especially at Universities were you have a great range 
of different types of mails. 

> Swedish chars are a superset of English chars. As are German and many
> others. To see that this is not an artificial, made up example please
> have a look at my real name. :)

Ok. My fault I mistook charsets with country codes. But replace se with ru or 
ch or greek7. The result is the same. You want one charset to be considered 
as "not ham" and you have to give the whole list to the parameter. And I 
think it is a long and ugly to read list (see: 
http://www.iana.org/assignments/character-sets)

I only want to say that there can be a situation in which you only know that 
you don't want to consider the XXX charset as an indicator for ham.

> > Now that sounds like a valid reason to me.
>
> It doesn't to me...
>
> Anyway, this whole example is non-realistic as is. As Matt pointed out
> in a later post, we are talking character sets here, not languages. In
> the world of ok_locales, there is no distinction between en and se,
> which is just en to ok_locales...

As I say I got confused with it (and be it maybe still).
>
>   guenther

Other question: How does Spamassassin know which charset it should use. 
Provides it a list of all charsets and compares or does it try it to find the 
information in the header of the mail or ...?

Greetings
Stefan

Attachment: pgpS7gggTHdfF.pgp
Description: PGP signature

Reply via email to