Re: [SAtalk] Testing and weighting rules against corpus of spam/non-spam

Matt Kettler Mon, 18 Aug 2003 06:58:56 -0700

At 01:31 AM 8/18/03 -0400, Eric Hart wrote:

1. I've seen people commenting on specific rules, saying that a particular rule generates x false positives and y false negatives against their corpus of ham and spam. How are they running these tests?

Well, getting the statistics is pretty easy.. use the mass-check and hit_frequencies tools that are in the "masses" subdirectory of the tarball.

Section 3.4 of the rule guide mentions this in brief.

2. I would like to run my entire rulebase against a ham/spam corpus, and arrive at statistically "best" weighting of rules. How is this done?

Well, you first run the mass-checks, and then feed it to the GA.. I've never run the GA myself, but you might be able to get some information out of the files in the masses subdirectory.. note that currently Theo is the only one that runs the GA. Since a single devel uses it, don't expect a comprehensive users manual, but the source code might have some decent coments in it.

-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Re: [SAtalk] Testing and weighting rules against corpus of spam/non-spam

Reply via email to