Re: [SAtalk] where's the F{ine}M regarding ok_locales?

dman Mon, 21 Jan 2002 07:59:36 -0800

On Mon, Jan 21, 2002 at 07:04:57PM +1100, Justin Mason wrote:
| 
| Craig Hughes said:
| 
| > Well, some of the ISO-8859-* should be "far away", shouldn't they? 
| > Weren't we treating some russian character set as "far away" not too
| > long ago?  How is say, arabic (8859-6) "closer" than russian?  Or how is
| > 8859-5 (cyrillic) not russian?
| 
| er, (waves hands) ;)


hehe.

| I dunno.  To tell the truth, I think the locales test isn't working as
| well as it should; perhaps it should be broken down into a "message is in
| random foreign charset" and a "message is in big5 in particular" test,
| since the main body of spam it needs to catch seems to be in the latter.
| 
| (I haven't seen a koi8-r or tis620 spam yet, as far as I know).

I get lots of korean spam.  I don't see much Big5 actually, (though
someone writes in english, plain ascii, yet sets their charset to
Big5; too bad for them (I actually asked him about it but no
repsonse)).

Perhaps there should be an option for listing charsets that the user
expects to received.  There could be both a "good" list and a "bad"
list with charsets for the current locale all considered "near".

While in the long run I think unicode is a good idea, it certainly
won't help us catch spam like this as easily.  (Hmm, actually, if the
characters are in a given range, then we will truly know which
language it is.  Then the test could be more acurrate, though maybe
not as simple.)

-D

-- 

"Don't use C;  In my opinion,  C is a library programming language
 not an app programming language."  - Owen Taylor (GTK+ developer)


_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Re: [SAtalk] where's the F{ine}M regarding ok_locales?

Reply via email to