Thursday, December 4, 2003, 10:23:38 PM, I responded to Jon Gerdes'
earlier email:

JG>> I'm still ploughing through some of your rules.  I've ammended your
JG>> CONSFROM9 rule somewhat: 

JG>> # A From without vowels is probably invalid
JG>> header   WHL_CONSFROM    From =~ /[EMAIL PROTECTED],20}\b/i
JG>> describe WHL_CONSFROM    From contains word consisting of consecutive consonants
JG>> score    WHL_CONSFROM    1.5

JG>> I just inverted the logic to NOT vowel and added in the @ symbol.
JG>> This address got picked up: 
JG>> [EMAIL PROTECTED]
JG>> OK could just up the limits in {} but I think that would miss the
JG>> point of the rule.  What do you think.

I ran your rule, with and without \ before the @, against my corpus.
Results for both: 6988s/218h of 62626 corpus
That compares to my RM_fl_ConsWord9 rule's 50s/0h of 63143 corpus

Per my admittedly arbitrary algorithm documented at
http://www.exit0.us/index.php/RM_RuleScoring I give my ConsWord9 rule a
1.50 score, and your ConsFrom rule a 1.319 score. Those ham hits really
bring down a score. (That's why my consonant list, bcghjklmnpqrstvwxz,
excludes d and f -- too many hits in ham.)

I don't know what the impact would be on FPs ... how many other rules
would hit a ham that matched your ConsFrom rule? If few, then your rule
with its >6k spam hit would be better. If many, then mine with no ham
hits is better.

I'm not prepared at this time to try to analyze that hit pattern.

Bob Menschel




-------------------------------------------------------
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to