Re: [guinevere-discuss] Lint errors in 3.4

Joe Zitnik Tue, 18 Dec 2007 07:19:25 -0800

>>> On 12/18/2007 at 10:02 AM, Matt Kettler <[EMAIL PROTECTED]>
wrote:
Joe Zitnik wrote:
> >>> On 12/18/2007 at 9:00 AM, Matt Kettler <[EMAIL PROTECTED]>
> wrote:
> Clay Davis wrote:
> > I've see several people write this.  Can someone point me to some
debate
> > I can review?  It seems to me that if you set the autolearn
threshold
> > fairly high and keep any eye on your bayes scoring, it would be a
good
> > thing.
> >  
> IMHO, autolearning is a good thing. However, exclusively
autolearning
> without ever providing any manual training is a situation that can
to
> lead to a mislearning disaster. The autolearner is most vulnerable
when
> it has to make judgments and there's no existing training to compare
> against.
>
> It's probably bad experience with that effect which has caused such
> gross over-reactions.
>
>
> You're exactly right, and in numerous posts on that forum, I've
stated
> exactly that.  On at least three different occasions, I have had to
> scrap my bayes database, and resend all e-mail received within a
given
> period because my bayes database became corrupted, either one way or
> the other.  In the years since that has happened, I have manually
fed
> bayes, and between the rules I have added, and some additional
> plugins, not only have I never had that issue again, my spam
catching
> is at an all time high.  All by taking a few minutes every week to
> feed the spam in that's making past the filters.  What may be a
gross
> over-reaction to you seems perfectly sensible to me.  I'm sure there
> are people who have great success with it, but for me, it was
NOTHING
> but trouble.  Mine is not the only story that I have read that has
had
> exactly the same results.
Well, if you had trouble exclusively autolearning with no manual
training. Perhaps the solution is to start using manual training in
addition to autolearning.


Also, generally speaking, you hear about the problems, but rarely hear
about the non-problems.

I've had autolearning enabled on the same bayes database I've been
using
since the bayes feature was introduced in SpamAssassin 2.50 back in
February of 2003. I've never had to scrap my bayes database. Not once.
I'm still using the same database (with a couple format conversions
during various upgrades) that I pre-initialized with several hundred
hand-picked messages.

My only variation is that somewhere around SA 3.0 (Sept 2004) I
lowered
the bayes_auto_learn_threshold_nonspam from the default to -0.001, and
added some rules with -0.001 scores that key off industry keywords.
This
was largely a precautionary measure, but I felt a positive-score for
this option was potentially dangerous. This is especially true if you
let your SA version get a little stale, as it becomes less effective
over time and spam is more likely to hit a 0 score. I wasn't having
any
troubles prior to my change, I was just being paranoid because I knew
I
was letting my SA version slip sometimes, and never switched back.

YMMV, but on an otherwise well maintained SA and bayes database,
auto-learning seems to work just fine.






I never exclusively autolearned, just to get bayes working requires
some manual feeding.  The first time my bayes blew up it had been
running fine for over six months.  During that time I manually fed in
thousands of spam and ham.  The second time it may have even been
longer.  The third time I'll take responsibility for, I had it shut off,
but an upgrade overwrote the value and turned it back on.
It's there for a reason, and much smarter men than me are responsible
for the spamassassin project, so I have to image large numbers of people
have had success with it.  Once again, from my vantage point, I was
burned three different times with it, so I don't use it.

Re: [guinevere-discuss] Lint errors in 3.4

Reply via email to