-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Thanks, actually that solved both my problems because it set me onto the track that led me to discover that the Bayesian filter was deactivated by our sysadmin, so I simply reactivated it in user_pres.
Thanks. Daniel. On Tue, Jul 22, 2003 at 11:55:13AM -0400, Matt Kettler wrote: > At 12:32 AM 7/22/2003 -0400, Daniel Carrera wrote: > > (sorry, I can't help you with your main question about learning from an > mbox file.. I've not done that myself, but I can help you with your second > part) > > >On a related note: How can I find out how many messages SA has learned > >from so far? I was just told that the Bayesian filer will kick in when I > >have 200 spam and 200 ham emails. I probably have all the ham I need from > >today's email batch, but it'll take me a while to gather 200 spams, so I'd > >like to know how close I am. > > One quick way to see which factor is currently preventing SA from running > bayes is to just turn on debug output: > > [EMAIL PROTECTED] Mail-SpamAssassin-2.55]$ spamassassin -tD <sample-spam.txt > debug: Score set 0 chosen. > debug: running in taint mode? no > debug: using "/usr/share/spamassassin" for default rules dir > debug: using "/etc/mail/spamassassin" for site rules dir > debug: using "/home/mkettler/.spamassassin" for user state dir > debug: using "/home/mkettler/.spamassassin/user_prefs" for user prefs file > debug: using "/home/mkettler/.spamassassin" for user state dir > debug: bayes: 27363 tie-ing to DB file R/O > /home/mkettler/.spamassassin/bayes_toks > debug: bayes: 27363 tie-ing to DB file R/O > /home/mkettler/.spamassassin/bayes_seen > debug: debug: Only 1 spam(s) in Bayes DB < 200 > > > A more detailed way is in the tools subdirectory of the tarball is a tool > called check_bayes_db. > > This tool dumps all the tokens and their probability statistics, but up at > the top it also spits out your totals > > [EMAIL PROTECTED] tools]$ ./check_bayes_db |more > 0.000 0 0 0 non-token data: db format = on-the-fly > probs, expiry, scan-counting > 0.000 0 1 0 non-token data: nspam > 0.000 0 4 0 non-token data: nham > 0.000 0 386 0 non-token data: ntokens > 0.000 0 0 0 non-token data: oldest age > 0.000 0 15 0 non-token data: current scan-count > 0.000 0 0 0 non-token data: last expiry scan-count > 0.722 1 1 13 THIS > 0.803 1 0 1 AWESOMELY > <snip.. more tokens from here down> > > - -- Daniel Carrera | OpenPGP fingerprint: Mathematics Dept. | 6643 8C8B 3522 66CB D16C D779 2FDD 7DAC 9AF7 7A88 UMD, College Park | http://www.math.umd.edu/~dcarrera/pgp.html -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (SunOS) iD8DBQE/HZ81nxE8DWHf+OcRAvcSAKCkvmxSV/1QluUQNFhGBOsuRb+BnwCgwZ5i zHUPAjCOsaaau3DCTlk3eDM= =NSPd -----END PGP SIGNATURE----- ------------------------------------------------------- This SF.net email is sponsored by: VM Ware With VMware you can run multiple operating systems on a single machine. WITHOUT REBOOTING! Mix Linux / Windows / Novell virtual machines at the same time. Free trial click here: http://www.vmware.com/wl/offer/345/0 _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk