(sorry, I can't help you with your main question about learning from an mbox file.. I've not done that myself, but I can help you with your second part)
On a related note: How can I find out how many messages SA has learned from so far? I was just told that the Bayesian filer will kick in when I have 200 spam and 200 ham emails. I probably have all the ham I need from today's email batch, but it'll take me a while to gather 200 spams, so I'd like to know how close I am.
One quick way to see which factor is currently preventing SA from running bayes is to just turn on debug output:
[EMAIL PROTECTED] Mail-SpamAssassin-2.55]$ spamassassin -tD <sample-spam.txt
debug: Score set 0 chosen.
debug: running in taint mode? no
debug: using "/usr/share/spamassassin" for default rules dir
debug: using "/etc/mail/spamassassin" for site rules dir
debug: using "/home/mkettler/.spamassassin" for user state dir
debug: using "/home/mkettler/.spamassassin/user_prefs" for user prefs file
debug: using "/home/mkettler/.spamassassin" for user state dir
debug: bayes: 27363 tie-ing to DB file R/O /home/mkettler/.spamassassin/bayes_toks
debug: bayes: 27363 tie-ing to DB file R/O /home/mkettler/.spamassassin/bayes_seen
debug: debug: Only 1 spam(s) in Bayes DB < 200
A more detailed way is in the tools subdirectory of the tarball is a tool called check_bayes_db.
This tool dumps all the tokens and their probability statistics, but up at the top it also spits out your totals
[EMAIL PROTECTED] tools]$ ./check_bayes_db |more
0.000 0 0 0 non-token data: db format = on-the-fly probs, expiry, scan-counting
0.000 0 1 0 non-token data: nspam
0.000 0 4 0 non-token data: nham
0.000 0 386 0 non-token data: ntokens
0.000 0 0 0 non-token data: oldest age
0.000 0 15 0 non-token data: current scan-count
0.000 0 0 0 non-token data: last expiry scan-count
0.722 1 1 13 THIS
0.803 1 0 1 AWESOMELY
<snip.. more tokens from here down>
------------------------------------------------------- This SF.net email is sponsored by: VM Ware With VMware you can run multiple operating systems on a single machine. WITHOUT REBOOTING! Mix Linux / Windows / Novell virtual machines at the same time. Free trial click here: http://www.vmware.com/wl/offer/345/0 _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk