Jason Baker <[EMAIL PROTECTED]> writes: > My company is both in Korea and in Canada, so we tend to get a lot of > collateral spam from Korean spamhouses AND legitimate mail. > > One point I haven't seen yet in the ruleset is that there's a law in > Korea that UCE (or perhaps even UBE) must have a subject header > denoting it. I don't read/speak Korean, so I have no idea what > exactly it is, but the characters are: 광고 > > (hope that comes through) > > It may be a good basis for a very focused spam rule. I've seen it > inside both () and [], but always at the front of the line.
Here are the strings I found more than once inside matching parens, square/angle brackets, and braces. A "*" means zero or more of the preceding character. INDEX COUNT STRING NOTES ----- ----- ----------------------- ----------------------------- 1 26 b1 a4 20* b0 ed the 20 is a space 2 3 c8 ab ba b8 variant of #5 ??? 3 3 bc ba c0 ce b1 a4 b0 ed similar to #1 4 3 b1 a4 2e b0 ed similar to #1 (2e or '.' replaces 20) 5 2 c1 a4 ba b8 variant of #2 ??? As best I can tell, your string was "SPACE ea b4 91 ea b3 a0" which bears zero resemblence to any of the above, so I hope yours got corrupted on the way here. (Dude, don't send unquoted binary!) Given that I can't display Korean, it's hard to know what means what. String #2 and #5 look like they could be related strings (c1 + 7 = c8, a4 + 7 = ab). These could all be variations in capitalization or something like that. Here's my best attempt at a regular expression, combining #1, #3, and #4. Here's my first pass (lightly tested). Combination of strings #1, #3, and #4. header KOREAN_UCE_SUBJECT Subject =~ /[({[<] *(\xbc\xba\xc0\xce)?\xb1\xa4( *|\x2e)\xb0\xed *[)}\]>]/ describe KOREAN_UCE_SUBJECT Subject has Korean unsolicited email denotation score KOREAN_UCE_SUBJECT 2.0 Is this right? I don't have that many Korean messages in my spam corpus and none in my nonspam corpus, so someone must check the meaning of this string before I check it in. I'd like to know more about the other two strings as well. Dan _______________________________________________________________ Don't miss the 2002 Sprint PCS Application Developer's Conference August 25-28 in Las Vegas -- http://devcon.sprintpcs.com/adp/index.cfm _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk