-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello Justin,

Friday, November 7, 2003, 10:19:15 AM, you wrote:

>>Many of us are finding we hit limits with simple regex rules. To me, an
>>accumulator eval for rules is the next logical step. 
>>
>>Make sense?

JM> BTW, SpamAssassin originally started with accumulating rules.  But I
JM> took it out, as it meant a long hammy mail had a much higher chance
JM> of FP'ing, due to containing more text.  

JM> I'd be worried that accumulating hits would reintroduce the same
JM> problem...

Yes, that's a risk.

It's a risk similar to that facing a long hammy email which comes from
Yahoo, with a [EMAIL PROTECTED] email address, and therefore has a
printer cartridge ad at the bottom, discusses how he's facing bankruptcy,
and therefore needs to buy something for just some amount, and shouts a
lot. Except for the facing bankruptcy part, that's my father.

The goal is to use such accumulating rules intelligently. Those of us who
have developed our own corpus and mass check testing method would
obviously test these rules against our corpus. Any accumulating rule that
risks FPs would be scored low.

Or better: what if we specified in the rule a maximum score to accumulate
to? Maybe something like:

accumbody  T_SAMPLE  /(?:word1|word2|word3|word4|word5)/i,max=2.5
describe   T_SAMPLE  Message has medical words frequently used in spam
score      T_SAMPLE  0.5

Each time any of the five words was used, it'd score 0.5, to a maximum
score of 2.5. No matter how long the message was, this rule could not by
itself cause an FP, and would work in conjunction only with other rules
to flag something as spam.

I can see that it would be a challenge adding this capabilitity into a GA
run, but if you can manage it, this would certainly lessen the FP risk.
Perhaps the GA run could even calculate what the max should be in order
to avoid FPs according to the corpus.

Bob Menschel

-----BEGIN PGP SIGNATURE-----
Version: PGP 8.0

iQA/AwUBP6xT5pebK8E4qh1HEQK9YwCg+/iy8DdaNhaXvU5LSGKtZXJkz+MAoOHo
7gndiL2StS/+5HezNFCUBsJy
=OXfx
-----END PGP SIGNATURE-----




-------------------------------------------------------
This SF.Net email sponsored by: ApacheCon 2003,
16-19 November in Las Vegas. Learn firsthand the latest
developments in Apache, PHP, Perl, XML, Java, MySQL,
WebDAV, and more! http://www.apachecon.com/
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to