> From: Chris Petersen
[...]
>
> Yes. though I used:
>
> /(\b[a-z]{4,12}\s+){12}/
>
> notice the initial /b, and there's no need to make SA continue to search
> beyond the "minimum" match, so leave off the , in the last {} cluster.
>
Looks good. just running this over a ham mail box with about 500 messages
and a spam mail box with the same, and not decoding base64 and such, I
see the following:
Length ham spam
10 10 378
11 5 324
12 4 282
13 0 239
(those are lines that matched, not messages).
Two of the 4 hits at length 12 was an iPod advert:
-> There's also great news if you want a larger iPod. From the very
beginning we've worked continually to increase the amount of music iPod can
hold without sacrificing portability, and we're pleased to announce a new
15GB model for only $299. It's available immediately, as are the 20GB model
for $399 and the 40GB model for $499. And if you buy now, you can make the
world's best personal digital music player even more personal with free
custom laser engraving.**
which one might be tempted to classify as spam. <g>
The other two matches were a bit of special case - it was a message that
I'd sent with me cc'd giving a list of the 300 most used words in
English. Not something I'd send out every day.
I tend to like the idea of weighting the 10 sequence low, say 0.5,
and the 13 sequence would get an extra bump of 2.0 more (making a
total of 2.5).
-------------------------------------------------------
This SF.net email is sponsored by: Perforce Software.
Perforce is the Fast Software Configuration Management System offering
advanced branching capabilities and atomic changes on 50+ platforms.
Free Eval! http://www.perforce.com/perforce/loadprog.html
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk