> Looks good. just running this over a ham mail box with about 500 messages
> and a spam mail box with the same, and not decoding base64 and such, I
> see the following:

what about something like:

/(?:\b(?!=(?:from|even|more|were|with)\b)[a-z]{4,12}\s+){12}/

I'm trying to think of extremely common 4-letter words, so this is
probably just a quick example.

> I tend to like the idea of weighting the 10 sequence low, say 0.5,
> and the 13 sequence would get an extra bump of 2.0 more (making a
> total of 2.5).

That makes sense.  Though I'd probably go with 10 low, and 15 high (like
3 or more).  But that's just me:

rawbody WORDWORD_10        
/(?:\b(?!=(?:from|even|more|were|with)\b)[a-z]{4,12}\s+){10}/
describe WORDWORD_10       string of 10+ random words
score WORDWORD_10          .5
                                                                                       
                  
rawbody WORDWORD_15        
/(?:\b(?!=(?:from|even|more|were|with)\b)[a-z]{4,12}\s+){15}/
describe WORDWORD_15       string of 15+ random words
score WORDWORD_15          2.5
                                                                                       
                  

-- 
Chris Petersen
Programmer / Web Designer
Silicon Mechanics:  http://www.siliconmechanics.com/
Blade Servers:      http://www.siliconmechanics.com/c292/blade-server.php
1U Servers:         http://www.siliconmechanics.com/c272/1u-server.php




-------------------------------------------------------
This SF.net email is sponsored by: Perforce Software.
Perforce is the Fast Software Configuration Management System offering
advanced branching capabilities and atomic changes on 50+ platforms.
Free Eval! http://www.perforce.com/perforce/loadprog.html
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to