Aaron Grewell wrote:
Hi Matt, I'm interested in how your setup compares to mine. I also find
Bayes very useful, but I haven't gotten it to work as well as what
you've described.
Interesting.. For me, BAYES_99 is right between SURBL and
URIBL in terms of
hits. (And has 98.91% of URIBL's total hits) I find it completely
indispensable.
Are you using a single site-wide database, or is this a per-user setup?
Im not matt, but running a very similar setup which works very well so i
thought i would comment also. Im running a single sitewide database.
All mail is processed under my spamd user.
I rarely train manually, except at initial setup where I feed
it a good
base learning. (the autolearner can sometimes go awry if you
don't train
some mail manually before letting it go.)
The trouble I had with the autolearner was that some spammers would send
innocuous mail through to raise their scores until Bayes decided they
were ok, then start spamming. That was a couple of versions back, does
that sort of thing no longer work?
I rarely train manually as well. The only ones i train (and its only
because there is nothing else to train) are spam which are correctly
identified as such but have autolearn=no because they did not meet the
autolearn criteria. These almost always have BAYES_99 and a score of 20
or so but most likely did not have enough header points to autolearn it.
I didnt even start training my database manually. I started from
scratch and let the autolearner do its thing. I have never had to
correct what it did because it was always always right. The poison that
spammers like to include in messages doesnt appear to have any affect on
the overall outcome of the bayes score. I dont really know why this is,
it just works.
NOTE: to operate in this fashion i believe it is imperative that you
change the autolearn thresholds. The defaults are dangerous! (atleast
in 2.64 which i still run). I have mine set as such:
bayes_auto_learn_threshold_nonspam -0.1
bayes_auto_learn_threshold_spam 10.0
To this date (been running over 2 years) i have yet to see the
autolearner misclassify. Most bayes hits are the far extremes (bayes_99
and bayes_0) with only a few in the 80-90 range.
On a day to day basis I mostly feed automatically with a cronjob that
collects mail via spamtraps and hamtraps. I have that coupled with
autolearning that's set a bit differently than the defaults. (IMNSHO,
having a ham learning threshold that's positive is suicide,
but I also have
a large number of small negative-score rules so I can keep my
threshold at
-0.01 and actually autolearn some ham).
I'd love to make my Bayesian database more effective, is there a doc
somewhere that describes how you tuned it to your environment?
I doubt there is anything that specific and if there was, it most likely
wouldnt help you in your situation. There are general tuning notes on
the SA website and such but you really just have to try and see what
works and what doesnt in your setup. What works well for 1 person may
not work at all for someone else.
-Jim