Hello Chr.,

Wednesday, January 14, 2004, 11:33:35 AM, you wrote:

CvS> Does somebody have/know a rule to catch 'unnecessary encodings'?

Define "unnecessary."  Some are valid, some are obfuscation attempts.

I use the following rules (see my personal rules pages on the exit0.us
wiki, and note that my scores are based on a 9.0 required hits):

header    RM_ft_Iso8859           From:raw =~ /iso-8859-1/i
describe  RM_ft_Iso8859           Sender references ISO standard, often spamsign
score     RM_ft_Iso8859           0.100  # 43s/48h of 68055 corpus; 45s/39h of 81383 
corpus
header    RM_ft_USAscii           From:raw =~ /us-ascii/i
describe  RM_ft_USAscii           From header specifies display in US-ascii, 
unnecessary unless spam hides subjectCvS> I saw a mail with the following subject:
score     RM_ft_USAscii           0.100  # type=spamp - 0s/0h of 39283 corpus; 1s/0h 
of 81383 corpus
header    RM_ft_KS5601            From:raw =~ /\=\?ks_c_5601\-1987\?/iCvS> ENCODED: 
Subject:
describe  RM_ft_KS5601            From header specifies display in Korean?, 
unnecessary unless spam hides subjectCvS> 
=?ISO-8859-1?B?RG8geW91cnNlbGYgYSBmYXZvciEgTG9vayBhdCB0aGlz?=
score     RM_ft_KS5601            0.500  # type=spamp - 5s/0h of 81383 corpus

header    RM_st_iso8859          Subject:raw =~ /iso-8859-1/i
describe  RM_st_iso8859          Subject specifies display in ISO-8859
score     RM_st_iso8859          1.978  # 3521s/35h of 81383 corpus
header    RM_st_iso8859x2        Subject:raw =~ /iso-8859-1.{1,80}iso-8859-1/i
describe  RM_st_iso8859x2        Subject specifies display in ISO-8859, twice
score     RM_st_iso8859x2        1.000  # adds to RM_sx_iso8859; 181s/6h of 81383 
corpus
header    RM_st_KS5601           Subject:raw =~ /\=\?ks_c_5601\-1987\?/i
describe  RM_st_KS5601           Subject specifies display in Korean?, unnecessary 
unless spam hides subject
score     RM_st_KS5601           1.320  # 32s/0h of 81383 corpus
header    RM_st_USAscii          Subject:raw =~ /us-ascii/i
describe  RM_st_USAscii          Subject specifies display in US-ascii, unnecessary 
unless spam hides subject
score     RM_st_USAscii          0.675  # 27s/3h of 81383 corpus
                                        # ham: MS Passport.com
header    RM_st_utf8             Subject:raw =~ /utf-8/i
describe  RM_st_utf8             Subject specifies display in utf-8
score     RM_st_utf8             1.000  # 3s/0h of 74872 corpus
header    RM_st_windows1251      Subject:raw =~ /windows-1251/i
describe  RM_st_windows1251      Subject specifies display in windows-1251
score     RM_st_windows1251      3.000  # FN: 22s/0h of 81383 corpus
header    RM_st_windows1255      Subject:raw =~ /windows-1255/i
describe  RM_st_windows1255      Subject specifies display in windows-1255
score     RM_st_windows1255      0.500  # 1s/0h of 74872 corpus

Bob Menschel





-------------------------------------------------------
This SF.net email is sponsored by: Perforce Software.
Perforce is the Fast Software Configuration Management System offering
advanced branching capabilities and atomic changes on 50+ platforms.
Free Eval! http://www.perforce.com/perforce/loadprog.html
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to