Hello Scott, Tuesday, January 6, 2004, 4:09:24 PM, you wrote:
SAC> On Tue, 30 Dec 2003 13:48:17 -0600, "Dallas L. Engelken" <[EMAIL PROTECTED]> writes: >> # SUBJ_SPELLING_00 -- 2283s/1850h of 10971 corpus, 2003-12-30 SAC> This doesn't tell me much. How many spams and hams are in the corpus? SAC> This would be a spectacular rule if the corpus is 23% spam --- it SAC> would catch nearly every one. If on the other hand, the corpus was 80% SAC> spam, this would be a bad rule --- it would have caught nearly every SAC> ham. SAC> Could others who report rules test results please state what SAC> percentage of their corpus is spam/ham? Good point. Looks like Dallas was copying my format of statistics, developed when I was unable to get hit-frequencies working. Example: > uri RM_u_UnsubscribePHP /unsubscribe\.php/i > describe RM_u_UnsubscribePHP text uri to unsubscribe link > score RM_u_UnsubscribePHP 3.000 # Dec 2003; 218s/0h of 81383 corpus Even now, it's easier for me to read and understand this than a bare Frequencies output: > #Freqs: 213 213 0 1.000 0.95 3.00 RM_u_UnsubscribePHP (partly because this does not include either the total spam/ham counts, nor the total count). I have just updated my masscheck script, so future reports should look more like: > score RM_u_UnsubscribePHP 3.000 # Dec 2003; 218s/0h of 81383 corpus > (65609s/15774h) Bob Menschel ------------------------------------------------------- This SF.net email is sponsored by: IBM Linux Tutorials. Become an expert in LINUX or just sharpen your skills. Sign up for IBM's Free Linux Tutorials. Learn everything from the bash shell to sys admin. Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk