On 8/2/23 15:52, David B Funk wrote:
Regardless, if a message has never been seen before and has little
correlation to earlier messages its Bayes should hit someplace in the
40% to 60% range.
The fact that it hit 00% indicates a strong correlation to lots of ham
(or something is screwy with your Bayes).
OK, here's what I got just now:
[thomas.cameron@mail-east ~]$ sa-learn --dump magic
0.000 0 3 0 non-token data: bayes db version
0.000 0 41449 0 non-token data: nspam
0.000 0 49720 0 non-token data: nham
0.000 0 162741 0 non-token data: ntokens
0.000 0 1689089541 0 non-token data: oldest atime
0.000 0 1691009577 0 non-token data: newest atime
0.000 0 1691007146 0 non-token data: last journal
sync atime
0.000 0 1690991018 0 non-token data: last expiry atime
0.000 0 1382400 0 non-token data: last expire
atime delta
0.000 0 13879 0 non-token data: last expire
reduction count
I can absolutely re-train Bayes. I am kind of an email pack-rat, so I
have over a gig of saved known good emails in various folders. I have SA
set up so that emails are scanned individually on a per user basis via
procmail rule:
[thomas.cameron@mail-east ~]$ head .procmailrc
MAILDIR=$HOME/mail
LOGFILE=$MAILDIR/procmail.log
:0fw: spamassassin.lock
* < 512000
| spamassassin
I have the users move spam to an imap folder, and then run (via the
user's cron job):
sa-learn --mbox --spam /home/[username]/mail/spam
If something is flagged as spam and it's not supposed to be, I have them
copy it to the ham folder and I run (also via cron job):
sa-learn --mbox --ham /home/[username]/mail/spam
For my email account, I've used my inbox and various other folders to
train Bayes in the past (although it's definitely been a while since I
did Bayes maintenance), but I have zero issue nuking my personal Bayes
data and starting over.
Thoughts?
--
Thomas