Re: Bayes DB does not grow anymore

GRP Productions 18 Mar 2005 08:38:47 -0000

Thanks for the offer. You can send it to the email address I use for this list, or you could just send me an FTP URL for retrieval.

Sorry I did not find the time to do this, but I will try to send it during the weekend.

Oh, yes. You need to have SURBL switched on via the init.pre (I think it's off by default) and you should use custom rules. I use a set of carefully chosen rulesets mostly from SARE and updated via rulesdujour and some more rules of my own accumulated over time.

It seems SURBL is now enabled by default. It has also changed its name to URIDNSBL :-) I do not use SARE rules (although I am trying to find time to look at them, as I am aware of their credibility). I use Gray's rules (http://files.grayonline.id.au), they seem quite efficient.

I think on a heavy traffic machine it's preferrable to have it off, especially when using MailScanner. Otherwise the expiry can kick in at random times every few hours (you can set a minimum time, though, f.i. one day). Some people run a scheduled expiry three times a day. That's an advice which often comes up on the Mailscanner list (which is a very helpful list, btw). Depends on how often you need it (whether it reaches the limit you want to hold more often or not). Starting with one expiry per night should be fine, but you should occasionally expire manually and look at the output, in case there are problems.

No. One should get rid of really old tokens, they are only "ballast" in the db. I don't know how a big db behaves on a busy site. Ours contain 1 Mio. tokens and have a size of 40 MB. They work very well with no ressource hogging. But I have only a few thousand messages running thru each of our servers, there's probably none which gets more than 10.000 a day. If you get 100.000 it may be different.

I understand what you say. The point is, what should be the criteria to understand if the time for an expiration has come? I mean, supposing we take only the size in consideration, could be a problem. What if some old tokens are still common nowadays in spam mail? You could say it doesn't matter it will be started again and recognize all the bad stuff. In that sense, we could just stop maintaining Bayes completely.

That's what we do. I only learn messages which were categorized wrong. Not by Bayes, but by SA. Most messages which get a score lower than 5 still get a BAYES_99 which means that Bayes identifies them all. Nevertheless, I learn these messages because they are spam and it reassures Bayes that they are spam. BTW: I have set BAYES_99 to 3.0, because it's so accurate for us.

As I told you, since my last post I have reset everything. It seems to me it works fine, and it learns rapidly. It gives me no reason not to trust it, in a degree I have set my SA score to be more or less equal with the BAYES_99 score (around 8). Of course I keep doing mistake-based learning, but most of the times I feed it with 'subjective' spam mail (ie. mail that my users don't want to receive, but is definitely not spam). I monitor it constantly and I am happy about it.

No problem :-) I tend to be a bit snappy on first messages which look to me like the author could have done a bit more research, but once we are over that stage I hope I can give some good advice based on my experience.

I have to admit that our communication was valuable to me, I learned so much about how the whole thing works. Once again, I appreciate it.

Greg

_________________________________________________________________ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/

Re: Bayes DB does not grow anymore

Reply via email to