On Mon, Jan 21, 2002 at 07:04:57PM +1100, Justin Mason wrote: | | Craig Hughes said: | | > Well, some of the ISO-8859-* should be "far away", shouldn't they? | > Weren't we treating some russian character set as "far away" not too | > long ago? How is say, arabic (8859-6) "closer" than russian? Or how is | > 8859-5 (cyrillic) not russian? | | er, (waves hands) ;)
hehe. | I dunno. To tell the truth, I think the locales test isn't working as | well as it should; perhaps it should be broken down into a "message is in | random foreign charset" and a "message is in big5 in particular" test, | since the main body of spam it needs to catch seems to be in the latter. | | (I haven't seen a koi8-r or tis620 spam yet, as far as I know). I get lots of korean spam. I don't see much Big5 actually, (though someone writes in english, plain ascii, yet sets their charset to Big5; too bad for them (I actually asked him about it but no repsonse)). Perhaps there should be an option for listing charsets that the user expects to received. There could be both a "good" list and a "bad" list with charsets for the current locale all considered "near". While in the long run I think unicode is a good idea, it certainly won't help us catch spam like this as easily. (Hmm, actually, if the characters are in a given range, then we will truly know which language it is. Then the test could be more acurrate, though maybe not as simple.) -D -- "Don't use C; In my opinion, C is a library programming language not an app programming language." - Owen Taylor (GTK+ developer) _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk