Jason Baker <[EMAIL PROTECTED]> writes:

> My company is both in Korea and in Canada, so we tend to get a lot of 
> collateral spam from Korean spamhouses AND legitimate mail.
> 
> One point I haven't seen yet in the ruleset is that there's a law in
> Korea that UCE (or perhaps even UBE) must have a subject header
> denoting it.  I don't read/speak Korean, so I have no idea what
> exactly it is, but the characters are: 광고
> 
> (hope that comes through)
> 
> It may be a good basis for a very focused spam rule.  I've seen it
> inside both () and [], but always at the front of the line.

Here are the strings I found more than once inside matching parens,
square/angle brackets, and braces.  A "*" means zero or more of the
preceding character.

INDEX   COUNT   STRING                    NOTES
-----   -----   -----------------------   -----------------------------
1       26      b1 a4 20* b0 ed           the 20 is a space
2       3       c8 ab ba b8               variant of #5 ???
3       3       bc ba c0 ce b1 a4 b0 ed   similar to #1
4       3       b1 a4 2e b0 ed            similar to #1 (2e or '.' replaces 20)
5       2       c1 a4 ba b8               variant of #2 ???

As best I can tell, your string was "SPACE ea b4 91 ea b3 a0" which
bears zero resemblence to any of the above, so I hope yours got
corrupted on the way here.  (Dude, don't send unquoted binary!)

Given that I can't display Korean, it's hard to know what means what.
String #2 and #5 look like they could be related strings (c1 + 7 = c8,
a4 + 7 = ab).  These could all be variations in capitalization or
something like that.  Here's my best attempt at a regular expression,
combining #1, #3, and #4.

Here's my first pass (lightly tested).  Combination of strings #1, #3, and #4.

header KOREAN_UCE_SUBJECT       Subject =~ /[({[<] *(\xbc\xba\xc0\xce)?\xb1\xa4( 
*|\x2e)\xb0\xed *[)}\]>]/
describe KOREAN_UCE_SUBJECT     Subject has Korean unsolicited email denotation
score KOREAN_UCE_SUBJECT        2.0

Is this right?  I don't have that many Korean messages in my spam corpus
and none in my nonspam corpus, so someone must check the meaning of this
string before I check it in.  I'd like to know more about the other two
strings as well.

Dan

_______________________________________________________________

Don't miss the 2002 Sprint PCS Application Developer's Conference
August 25-28 in Las Vegas -- http://devcon.sprintpcs.com/adp/index.cfm

_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to