Aaron Grewell wrote:
Hi Matt, I'm interested in how your setup compares to mine.  I also find
Bayes very useful, but I haven't gotten it to work as well as what
you've described.

Interesting.. For me, BAYES_99 is right between SURBL and URIBL in terms of hits. (And has 98.91% of URIBL's total hits) I find it completely indispensable.


Are you using a single site-wide database, or is this a per-user setup?


Im not matt, but running a very similar setup which works very well so i thought i would comment also. Im running a single sitewide database. All mail is processed under my spamd user.


I rarely train manually, except at initial setup where I feed it a good base learning. (the autolearner can sometimes go awry if you don't train some mail manually before letting it go.)


The trouble I had with the autolearner was that some spammers would send
innocuous mail through to raise their scores until Bayes decided they
were ok, then start spamming.  That was a couple of versions back, does
that sort of thing no longer work?


I rarely train manually as well. The only ones i train (and its only because there is nothing else to train) are spam which are correctly identified as such but have autolearn=no because they did not meet the autolearn criteria. These almost always have BAYES_99 and a score of 20 or so but most likely did not have enough header points to autolearn it.

I didnt even start training my database manually. I started from scratch and let the autolearner do its thing. I have never had to correct what it did because it was always always right. The poison that spammers like to include in messages doesnt appear to have any affect on the overall outcome of the bayes score. I dont really know why this is, it just works.

NOTE: to operate in this fashion i believe it is imperative that you change the autolearn thresholds. The defaults are dangerous! (atleast in 2.64 which i still run). I have mine set as such:

bayes_auto_learn_threshold_nonspam -0.1
bayes_auto_learn_threshold_spam 10.0

To this date (been running over 2 years) i have yet to see the autolearner misclassify. Most bayes hits are the far extremes (bayes_99 and bayes_0) with only a few in the 80-90 range.


On a day to day basis I mostly feed automatically with a cronjob that collects mail via spamtraps and hamtraps. I have that coupled with autolearning that's set a bit differently than the defaults. (IMNSHO, having a ham learning threshold that's positive is suicide, but I also have a large number of small negative-score rules so I can keep my threshold at -0.01 and actually autolearn some ham).


I'd love to make my Bayesian database more effective, is there a doc
somewhere that describes how you tuned it to your environment?


I doubt there is anything that specific and if there was, it most likely wouldnt help you in your situation. There are general tuning notes on the SA website and such but you really just have to try and see what works and what doesnt in your setup. What works well for 1 person may not work at all for someone else.

-Jim

Reply via email to