On Saturday 08 December 2007 01:15, Karsten Bräckelmann wrote: <snip>
> > Ok. My fault I mistook charsets with country codes. But replace se with > > ru or ch or greek7. The result is the same. You want one charset to be > > considered as "not ham" and you have to give the whole list to the > > parameter. And I think it is a long and ugly to read list (see: > > http://www.iana.org/assignments/character-sets) > > Yes, that list indeed is ugly. However, that is *not* what we are > talking about. The list of valid locales for ok_locales can be found in > the docs -- and totals 6, including en... Only 6? Yes, I found it in the docs. (Yeah, I know: RTFM before you ask around). I appologize, with only 6 charsets it is not useful to have a not_ok_locales option. > > I only want to say that there can be a situation in which you only know > > that you don't want to consider the XXX charset as an indicator for ham. > > Despite its name, ok_locales is *not* about certain charsets being "an > indicator for ham". The opposite is true. It does not assign a negative > score. All it does is assigning a positive score for charsets "not in > the ok list". Maybe I should have said: "an indicator for NOT spam" ? Sh.., there are too many double negations and I'm too tired for that. > > > Anyway, this whole example is non-realistic as is. As Matt pointed out > > > in a later post, we are talking character sets here, not languages. In > > > the world of ok_locales, there is no distinction between en and se, > > > which is just en to ok_locales... > > > > As I say I got confused with it (and be it maybe still). > > > > Other question: How does Spamassassin know which charset it should use. > > Provides it a list of all charsets and compares or does it try it to find > > the information in the header of the mail or ...? > > Unfortunately, I don't know either. Although I'd like to... > > As per my counter example above, I do not want CHARSET_FARAWAY and > friends to score on mail, just because a fellow hacker happens to have > his original name in his sig or From: header. And it probably doesn't > come as a surprise, that the example actually is real life. ;) > > > Maybe the devs can briefly explain how the charset is being determined. > Or at least, where exactly in the code one could find it... > > guenther - who is too lazy to dig through all the code right now :) Bye Stefan
pgpYZDQf0pfsw.pgp
Description: PGP signature