So, I'm running SA 2.60 with bayes enabled. I've got a folder to which people can drag emails that are misclassified. This has always worked very well in the past with 2.55.
What I've noticed is that when SA learns from a spam, the bayes score usually shoots way up to 99% right away (an improvement over 2.55, with which learning did not make the bayes score raise by that much in most cases). This is great. However, what I've noticed is that bayes seems to be "forgetting". I had a couple of mails I trained it on last week, and immediately after learning them, it was hitting BAYES_99. Today one is hitting BAYES_50, and the other is not hitting any bayes rules, which I take to mean the check_bayes algorithm is returning somethintg between 0.4999 and 0.5001, as this is the only area not scored by some amount in the 23_bayes.cf file. As for bayes settings in my local.cf, what I've got: use_bayes 1 bayes_path /var/amavis/bayes bayes_auto_expire 0 bayes_journal_max_size 300000 (I bumped this up from the default 150000 in an effort to help bayes "remember" more) auto_learn 1 I've left the auto-learning setting at their defaults (0.1 and 12.0 defaults, I believe). Something, I assume, is polluting my bayes database. I'm thinking this is either misclassified email (users putting legit email in the spam folder to be learned) or auto-learning causing a problem. I've just turned off auto-learning in an effort to fix this problem. Does anyone have any other ideas/suggestions? Some other possibly relevant info... Since I'm also running it with amavisd-new, I've got opportunistic expiring turned off and have a cron job every night run as follows, which is what I always did with 2.55: # Every night at 1am, do an expiration of the bayes DB 0 1 * * * /bin/nice -n 19 /usr/bin/sa-learn --force-expire One oddity I've noticed with sa-learn in 2.60, is that from this cron job I often get this output, which is what I expect, as it is what I saw with 2.66: (from Sat 11/1 1:05am) ....................... ............................................................................ ............................................................................ ............................................................................ ............................................................................ ...........synced Bayes databases from journal in 64 seconds: 23388 unique entries (105282 total entries) expired old Bayes database entries in 213 seconds 125886 entries kept, 189705 deleted token frequency: 1-occurence tokens: 58.27% token frequency: less than 8 occurrences: 21.22% However, this is the output I got back from the cron job on the following 2 nights: (Sun 11/2 1:02am) ........................... synced Bayes databases from journal in 89 seconds: 27553 unique entries (190806 total entries) (Mon 11/3 1:04am) ................... synced Bayes databases from journal in 69 seconds: 19700 unique entries (92153 total entries) No information on expiring old entries or token frequency. Is this normal? I am using the --force-expire flag; shouldn't it force this output? Also, here is my info on my bayes_db from this morning. As you can see, I have far more spam than ham, but in the past this has not been a problem at all; bayes has been wonderful: bash-2.05$ sa-learn --dump magic 0.000 0 2 0 non-token data: bayes db version 0.000 0 276932 0 non-token data: nspam 0.000 0 17474 0 non-token data: nham 0.000 0 267264 0 non-token data: ntokens 0.000 0 1067558608 0 non-token data: oldest atime 0.000 0 1067885333 0 non-token data: newest atime 0.000 0 1067885222 0 non-token data: last journal sync atime 0.000 0 1067843034 0 non-token data: last expiry atime 0.000 0 111408 0 non-token data: last expire atime delta 0.000 0 189705 0 non-token data: last expire reduction count So, anyone else seeing such bayes issues? Any suggestions on tracking down the problem? thanks! johnS ------------------------------------------------------- This SF.net email is sponsored by: SF.net Giveback Program. Does SourceForge.net help you be more productive? Does it help you create better code? SHARE THE LOVE, and help us help YOU! Click Here: http://sourceforge.net/donate/ _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk