Re: rules better than bayes?

Jim Maul Tue, 10 Jan 2006 09:38:52 -0800

Aaron Grewell wrote:

Hi Matt, I'm interested in how your setup compares to mine.  I also find
Bayes very useful, but I haven't gotten it to work as well as what
you've described.
Interesting.. For me, BAYES_99 is right between SURBL andURIBL in terms ofhits. (And has 98.91% of URIBL's total hits) I find it completelyindispensable.
Are you using a single site-wide database, or is this a per-user setup?

Im not matt, but running a very similar setup which works very well so ithought i would comment also. Im running a single sitewide database.All mail is processed under my spamd user.

I rarely train manually, except at initial setup where I feedit a goodbase learning. (the autolearner can sometimes go awry if youdon't trainsome mail manually before letting it go.)
The trouble I had with the autolearner was that some spammers would send
innocuous mail through to raise their scores until Bayes decided they
were ok, then start spamming.  That was a couple of versions back, does
that sort of thing no longer work?

I rarely train manually as well. The only ones i train (and its onlybecause there is nothing else to train) are spam which are correctlyidentified as such but have autolearn=no because they did not meet theautolearn criteria. These almost always have BAYES_99 and a score of 20or so but most likely did not have enough header points to autolearn it.

I didnt even start training my database manually. I started fromscratch and let the autolearner do its thing. I have never had tocorrect what it did because it was always always right. The poison thatspammers like to include in messages doesnt appear to have any affect onthe overall outcome of the bayes score. I dont really know why this is,it just works.

NOTE: to operate in this fashion i believe it is imperative that youchange the autolearn thresholds. The defaults are dangerous! (atleastin 2.64 which i still run). I have mine set as such:


bayes_auto_learn_threshold_nonspam -0.1
bayes_auto_learn_threshold_spam 10.0

To this date (been running over 2 years) i have yet to see theautolearner misclassify. Most bayes hits are the far extremes (bayes_99and bayes_0) with only a few in the 80-90 range.

On a day to day basis I mostly feed automatically with a cronjob thatcollects mail via spamtraps and hamtraps. I have that coupled withautolearning that's set a bit differently than the defaults. (IMNSHO,having a ham learning threshold that's positive is suicide,but I also havea large number of small negative-score rules so I can keep mythreshold at-0.01 and actually autolearn some ham).
I'd love to make my Bayesian database more effective, is there a doc
somewhere that describes how you tuned it to your environment?

I doubt there is anything that specific and if there was, it most likelywouldnt help you in your situation. There are general tuning notes onthe SA website and such but you really just have to try and see whatworks and what doesnt in your setup. What works well for 1 person maynot work at all for someone else.


-Jim

Re: rules better than bayes?

Reply via email to