>>> On 12/18/2007 at 10:02 AM, Matt Kettler <[EMAIL PROTECTED]> wrote: Joe Zitnik wrote: > >>> On 12/18/2007 at 9:00 AM, Matt Kettler <[EMAIL PROTECTED]> > wrote: > Clay Davis wrote: > > I've see several people write this. Can someone point me to some debate > > I can review? It seems to me that if you set the autolearn threshold > > fairly high and keep any eye on your bayes scoring, it would be a good > > thing. > > > IMHO, autolearning is a good thing. However, exclusively autolearning > without ever providing any manual training is a situation that can to > lead to a mislearning disaster. The autolearner is most vulnerable when > it has to make judgments and there's no existing training to compare > against. > > It's probably bad experience with that effect which has caused such > gross over-reactions. > > > You're exactly right, and in numerous posts on that forum, I've stated > exactly that. On at least three different occasions, I have had to > scrap my bayes database, and resend all e-mail received within a given > period because my bayes database became corrupted, either one way or > the other. In the years since that has happened, I have manually fed > bayes, and between the rules I have added, and some additional > plugins, not only have I never had that issue again, my spam catching > is at an all time high. All by taking a few minutes every week to > feed the spam in that's making past the filters. What may be a gross > over-reaction to you seems perfectly sensible to me. I'm sure there > are people who have great success with it, but for me, it was NOTHING > but trouble. Mine is not the only story that I have read that has had > exactly the same results. Well, if you had trouble exclusively autolearning with no manual training. Perhaps the solution is to start using manual training in addition to autolearning.
Also, generally speaking, you hear about the problems, but rarely hear about the non-problems. I've had autolearning enabled on the same bayes database I've been using since the bayes feature was introduced in SpamAssassin 2.50 back in February of 2003. I've never had to scrap my bayes database. Not once. I'm still using the same database (with a couple format conversions during various upgrades) that I pre-initialized with several hundred hand-picked messages. My only variation is that somewhere around SA 3.0 (Sept 2004) I lowered the bayes_auto_learn_threshold_nonspam from the default to -0.001, and added some rules with -0.001 scores that key off industry keywords. This was largely a precautionary measure, but I felt a positive-score for this option was potentially dangerous. This is especially true if you let your SA version get a little stale, as it becomes less effective over time and spam is more likely to hit a 0 score. I wasn't having any troubles prior to my change, I was just being paranoid because I knew I was letting my SA version slip sometimes, and never switched back. YMMV, but on an otherwise well maintained SA and bayes database, auto-learning seems to work just fine. I never exclusively autolearned, just to get bayes working requires some manual feeding. The first time my bayes blew up it had been running fine for over six months. During that time I manually fed in thousands of spam and ham. The second time it may have even been longer. The third time I'll take responsibility for, I had it shut off, but an upgrade overwrote the value and turned it back on. It's there for a reason, and much smarter men than me are responsible for the spamassassin project, so I have to image large numbers of people have had success with it. Once again, from my vantage point, I was burned three different times with it, so I don't use it.