-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello Justin,
Friday, November 7, 2003, 10:19:15 AM, you wrote: >>Many of us are finding we hit limits with simple regex rules. To me, an >>accumulator eval for rules is the next logical step. >> >>Make sense? JM> BTW, SpamAssassin originally started with accumulating rules. But I JM> took it out, as it meant a long hammy mail had a much higher chance JM> of FP'ing, due to containing more text. JM> I'd be worried that accumulating hits would reintroduce the same JM> problem... Yes, that's a risk. It's a risk similar to that facing a long hammy email which comes from Yahoo, with a [EMAIL PROTECTED] email address, and therefore has a printer cartridge ad at the bottom, discusses how he's facing bankruptcy, and therefore needs to buy something for just some amount, and shouts a lot. Except for the facing bankruptcy part, that's my father. The goal is to use such accumulating rules intelligently. Those of us who have developed our own corpus and mass check testing method would obviously test these rules against our corpus. Any accumulating rule that risks FPs would be scored low. Or better: what if we specified in the rule a maximum score to accumulate to? Maybe something like: accumbody T_SAMPLE /(?:word1|word2|word3|word4|word5)/i,max=2.5 describe T_SAMPLE Message has medical words frequently used in spam score T_SAMPLE 0.5 Each time any of the five words was used, it'd score 0.5, to a maximum score of 2.5. No matter how long the message was, this rule could not by itself cause an FP, and would work in conjunction only with other rules to flag something as spam. I can see that it would be a challenge adding this capabilitity into a GA run, but if you can manage it, this would certainly lessen the FP risk. Perhaps the GA run could even calculate what the max should be in order to avoid FPs according to the corpus. Bob Menschel -----BEGIN PGP SIGNATURE----- Version: PGP 8.0 iQA/AwUBP6xT5pebK8E4qh1HEQK9YwCg+/iy8DdaNhaXvU5LSGKtZXJkz+MAoOHo 7gndiL2StS/+5HezNFCUBsJy =OXfx -----END PGP SIGNATURE----- ------------------------------------------------------- This SF.Net email sponsored by: ApacheCon 2003, 16-19 November in Las Vegas. Learn firsthand the latest developments in Apache, PHP, Perl, XML, Java, MySQL, WebDAV, and more! http://www.apachecon.com/ _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk