Hello Scott,

Tuesday, January 6, 2004, 4:09:24 PM, you wrote:

SAC> On Tue, 30 Dec 2003 13:48:17 -0600, "Dallas L. Engelken" <[EMAIL PROTECTED]> 
writes:

>> # SUBJ_SPELLING_00 -- 2283s/1850h of 10971 corpus, 2003-12-30

SAC> This doesn't tell me much. How many spams and hams are in the corpus?
SAC> This would be a spectacular rule if the corpus is 23% spam --- it
SAC> would catch nearly every one. If on the other hand, the corpus was 80%
SAC> spam, this would be a bad rule --- it would have caught nearly every
SAC> ham.

SAC> Could others who report rules test results please state what
SAC> percentage of their corpus is spam/ham?

Good point. Looks like Dallas was copying my format of statistics,
developed when I was unable to get hit-frequencies working. Example:
> uri       RM_u_UnsubscribePHP    /unsubscribe\.php/i
> describe  RM_u_UnsubscribePHP    text uri to unsubscribe link
> score     RM_u_UnsubscribePHP    3.000  # Dec 2003; 218s/0h of 81383 corpus

Even now, it's easier for me to read and understand this than a bare
Frequencies output:
> #Freqs:     213      213        0    1.000   0.95   3.00  RM_u_UnsubscribePHP
(partly because this does not include either the total spam/ham counts,
nor the total count).

I have just updated my masscheck script, so future reports should look
more like:
> score     RM_u_UnsubscribePHP    3.000  # Dec 2003; 218s/0h of 81383 corpus 
> (65609s/15774h)

Bob Menschel





-------------------------------------------------------
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to