On 8/2/23 15:52, David B Funk wrote:

Regardless, if a message has never been seen before and has little correlation to earlier messages its Bayes should hit someplace in the 40% to 60% range.

The fact that it hit 00% indicates a strong correlation to lots of ham (or something is screwy with your Bayes).

OK, here's what I got just now:

[thomas.cameron@mail-east ~]$ sa-learn --dump magic
0.000          0          3          0  non-token data: bayes db version
0.000          0      41449          0  non-token data: nspam
0.000          0      49720          0  non-token data: nham
0.000          0     162741          0  non-token data: ntokens
0.000          0 1689089541          0  non-token data: oldest atime
0.000          0 1691009577          0  non-token data: newest atime
0.000 0 1691007146 0 non-token data: last journal sync atime
0.000          0 1690991018          0  non-token data: last expiry atime
0.000 0 1382400 0 non-token data: last expire atime delta 0.000 0 13879 0 non-token data: last expire reduction count

I can absolutely re-train Bayes. I am kind of an email pack-rat, so I have over a gig of saved known good emails in various folders. I have SA set up so that emails are scanned individually on a per user basis via procmail rule:

[thomas.cameron@mail-east ~]$ head .procmailrc
MAILDIR=$HOME/mail
LOGFILE=$MAILDIR/procmail.log

:0fw: spamassassin.lock
* < 512000
| spamassassin

I have the users move spam to an imap folder, and then run (via the user's cron job):

sa-learn --mbox --spam /home/[username]/mail/spam

If something is flagged as spam and it's not supposed to be, I have them copy it to the ham folder and I run (also via cron job):

sa-learn --mbox --ham /home/[username]/mail/spam

For my email account, I've used my inbox and various other folders to train Bayes in the past (although it's definitely been a while since I did Bayes maintenance), but I have zero issue nuking my personal Bayes data and starting over.

Thoughts?

--
Thomas

Reply via email to