> Looks good. just running this over a ham mail box with about 500 messages > and a spam mail box with the same, and not decoding base64 and such, I > see the following:
what about something like: /(?:\b(?!=(?:from|even|more|were|with)\b)[a-z]{4,12}\s+){12}/ I'm trying to think of extremely common 4-letter words, so this is probably just a quick example. > I tend to like the idea of weighting the 10 sequence low, say 0.5, > and the 13 sequence would get an extra bump of 2.0 more (making a > total of 2.5). That makes sense. Though I'd probably go with 10 low, and 15 high (like 3 or more). But that's just me: rawbody WORDWORD_10 /(?:\b(?!=(?:from|even|more|were|with)\b)[a-z]{4,12}\s+){10}/ describe WORDWORD_10 string of 10+ random words score WORDWORD_10 .5 rawbody WORDWORD_15 /(?:\b(?!=(?:from|even|more|were|with)\b)[a-z]{4,12}\s+){15}/ describe WORDWORD_15 string of 15+ random words score WORDWORD_15 2.5 -- Chris Petersen Programmer / Web Designer Silicon Mechanics: http://www.siliconmechanics.com/ Blade Servers: http://www.siliconmechanics.com/c292/blade-server.php 1U Servers: http://www.siliconmechanics.com/c272/1u-server.php ------------------------------------------------------- This SF.net email is sponsored by: Perforce Software. Perforce is the Fast Software Configuration Management System offering advanced branching capabilities and atomic changes on 50+ platforms. Free Eval! http://www.perforce.com/perforce/loadprog.html _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk