Hello Larry,

Tuesday, November 25, 2003, 12:34:57 PM, you wrote:

LG> Attached is a custom rule file.  It has been working rather well and
LG> I will be increasing the score from 0.5 to 1.0.  The cf file also has
LG> some rules looking for words obfuscated by pipes.  They have been
LG> working well also.

FYI, My masscheck results with your rules (run against my corpus of
58,857 emails). Final number on each line is what I would initially score
them based on these hits (per my algorithm posted at
http://www.exit0.us/index.php/RM_RuleScoring -- most sites should
probably score these lower, and I would probably want to do a 2-pass or
3-pass GA on these to refine the scores myself). 

MY_RBDY_PDS_1P3   -- 375s /  22h -- 1.163
MY_RBDY_PDS_1P4   -- 365s /   5h -- 1.608
MY_RBDY_PDS_1P5   -- 210s /   3h -- 1.700
MY_RBDY_PDS_1P6   -- 165s /   2h -- 0.550
MY_RBDY_PDS_1P7   --  88s /   0h -- 1.880
MY_RBDY_PDS_1P8   -- 121s /   4h -- 1.302
MY_RBDY_PDS_2P2   -- 168s /  14h -- 1.112
MY_RBDY_PDS_2P3   -- 105s /  45h -- 0.228
MY_RBDY_PDS_2P4   -- 311s /  14h -- 1.207
MY_RBDY_PDS_2P5   --  56s /   7h -- 0.700
MY_RBDY_PDS_2P6   -- 161s /   8h -- 1.179
MY_RBDY_PDS_2P7   --  89s /   5h -- 1.148
MY_RBDY_PDS_2P8   --   4s /   5h -- 0.067 or -0.100
MY_RBDY_PDS_3P1   -- 200s /  15h -- 1.125
MY_RBDY_PDS_3P2   -- 173s /  25h -- 6.654
MY_RBDY_PDS_3P3   -- 179s /  58h -- 0.303
MY_RBDY_PDS_3P4   --  74s /  15h -- 0.463
MY_RBDY_PDS_3P5   -- 195s /  12h -- 1.150
MY_RBDY_PDS_3P6   --  43s /   5h -- 0.717
MY_RBDY_PDS_3P7   --   3s /   5h -- 0.050 or -0.125
MY_RBDY_PDS_3P8   --  42s /  49h -- 0.084 or -0.114
MY_RBDY_PDS_4P1   -- 285s /  32h -- 0.864
MY_RBDY_PDS_4P2   -- 417s /  21h -- 1.190
MY_RBDY_PDS_4P3   -- 259s /  82h -- 0.312
MY_RBDY_PDS_4P4   -- 160s /  26h -- 0.593
MY_RBDY_PDS_4P5   --  56s /  17h -- 0.311
MY_RBDY_PDS_4P6   --   7s /   0h -- 0.700
MY_RBDY_PDS_4P7   --   3s /  12h -- 0.023 or -0.300
MY_RBDY_PDS_4P8   --   2s /   0h -- 0.200
MY_RBDY_PDS_5P1   --  84s /  21h -- 0.382
MY_RBDY_PDS_5P3   --  99s / 464h -- 0.021 or -0.464
MY_RBDY_PDS_5P5   --  81s /  12h -- 0.623
MY_RBDY_PDS_6P6   --  99s / 464h -- 0.021 or -0.464
MY_HDR_PDS_1P5    -- 140s /   0h -- 2.400
MY_HDR_PDS_2P1    -- 244s /   3h -- 1.610
MY_HDR_PDS_2P4    -- 176s /  13h -- 1.126
MY_HDR_PDS_3P2    -- 308s /   9h -- 1.308
MY_HDR_PDS_3P3    -- 607s / 528h -- 0.115
MY_HDR_PDS_3P5    -- 108s /   0h -- 2.080
MY_HDR_PDS_3P8    --  73s /   0h -- 1.730
MY_HDR_PDS_4P3    -- 481s / 519h -- 0.093 or -0.108
MY_HDR_PDS_4P4    -- 114s /  13h -- 0.814
MY_HDR_PDS_4P5    --  82s /   0h -- 1.820
MY_HDR_PDS_5P1    -- 171s /   0h -- 2.710
MY_HDR_PDS_6P1    -- 159s /   0h -- 2.590
MY_HDR_PDS_6P2    -- 122s /   9h -- 1.122
MY_BDY_PIPE_S233S --  17s /   0h -- 1.170
MY_BDY_PIPE_S23S  --  35s /   0h -- 1.350
MY_BDY_PIPE_S23C  --  17s /   0h -- 1.170
MY_BDY_PIPE_S24S  --  42s /   0h -- 1.420
MY_BDY_PIPE_S34P  --   0s /   0h -- 0.100
MY_HDR_PIPE_S233S --   0s /   0h -- 0.100
MY_HDR_PIPE_S23S  --   0s /   0h -- 0.100
MY_HDR_PIPE_S23C  --   0s /   0h -- 0.100
MY_HDR_PIPE_S24S  --   0s /   0h -- 0.100
MY_HDR_PIPE_S34P  --   0s /   0h -- 0.100

Two ham scored 5.0:
.  5 file=../massham/ham.0307.5360234
     rules=MY_RBDY_PDS_2P3, MY_RBDY_PDS_2P4, MY_RBDY_PDS_3P1,
           MY_RBDY_PDS_3P2, MY_RBDY_PDS_3P3, MY_RBDY_PDS_3P4,
           MY_RBDY_PDS_4P3, MY_RBDY_PDS_4P5, MY_RBDY_PDS_5P3,
           MY_RBDY_PDS_6P6  
.  5 file=../massham/ham.0307.5383420
     rules=MY_RBDY_PDS_2P3, MY_RBDY_PDS_2P4, MY_RBDY_PDS_3P1,
           MY_RBDY_PDS_3P2, MY_RBDY_PDS_3P3, MY_RBDY_PDS_3P4,
           MY_RBDY_PDS_4P3, MY_RBDY_PDS_4P5, MY_RBDY_PDS_5P3,
           MY_RBDY_PDS_6P6  

Negative scores: So far when I use rules like this I've generally been
scoring them positive, often with a minimum of 0.100, even when they hit
more ham than spam. My philosophy has been "This is often used/seen in
spam. If it's spam, there should be enough rules hit to flag it as spam.
If it's ham, the few rules hit wouldn't matter since they wouldn't reach
my spam threshold."

So, note to spammers: you can't fake your way past my system by using
those "more ham than spam" combinations -- they won't get you negative
scores.

However, I'm thinking that rules like
MY_RBDY_PDS_6P6   --  99s / 464h -- 0.021 or -0.464
which hit several ham for each spam, maybe it would be useful to score
those negatively in my system, as a way of avoiding FPs when using rule
sets like these.

How do others feel about this type of question?

Bob Menschel




-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
Does SourceForge.net help you be more productive?  Does it
help you create better code?  SHARE THE LOVE, and help us help
YOU!  Click Here: http://sourceforge.net/donate/
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to