Ryan Kather wrote: > > SpamAssassin- > Now here is where I need the help (assuming my postfix section was > sound). I want to make sure this is as optimized as possible to > provide a fair performance picture versus SpamAssassin and Barracuda. > > It appears many seem to be using the Amavsid-new + Postfix + > SpamAssassin configuration. Is there a reason not to use this > design? I have had good luck with this in the past.
There are several ways to call SpamAssassin. If you have used Amavis and are familiar with its configuration, there is no reason not to use it. Just keep in mind that when you use SA through Amavis, Amavis controls the spam threshold and message markup. > I also have read a lot where people are improving accuracy by > increasing the scoring of the Bayesian database (which needs > training). What would the optimal training method be, given my > environment? I could create a shared GroupWise IMAP folder for > unclassified spam with a cron job to read this into sa-learn. I > cannot have a central IMAP folder for false positives, however, as > other users must not be able to view the email for other users. How > can I insure user false positives are easily reportable? What do > others do to train the Bayesian database? Maia-Mailguard? You need to be able to train on both spam and ham. Is it possible to create an imap folder that your users would be able to put messages into, but not view? I would also agree with another poster who said that you should widen the bayes learning thresholds. This is especially true if you are not going to start with manual learning up front. > I could pretty much trust a small subset of users to be fairly > regular in their training. There is a somewhat larger portion of > users who would train here and there. Lastly, the largest portion of > users may never train. We also do not know which user belongs to > which group (yet). With this scenario it seems that I will have to > use some kind of common database. In the default configuration SA > uses one Bayesian database for all users. Is there a reason to > change this? What is the consensus on a shared ruleset versus > individual rulesets? Actually, the default SpamAssassin configuration uses per-user databases. It is Amavis that forces you to use a common database. A common database is easier to manage, but a per-user database will be more accurate (especially if the user trains it manually). > It also seems that there is a falling out between pyzor, dcc, razor, > and the community. Is it simply a licensing issue (with legal > implications), or are these systems flawed otherwise. What > alternatives are there? Do I even need this functionality? Has > anyone seen a detriment to SpamAssassin's performance without DCC, > Pyzor, or Razor. I think it is mainly the licensing issue. Razor2 has recently changed its licensing so that it is available for everyone. I use all three on my server and get good results from them. > What about an initial corpus to train the Bayesian database? Will > this hurt my accuracy in the long term? What corpuses are being > used? Am I better off letting the Bayesian autolearn gradually > perform this function? Since everyone's spam and ham are different, a generic corpus will not get you very far. The main advantage of Bayes is that it learns about YOUR spam and ham and classifies messages accordingly. If you train it from a generic corpus, your results will not be nearly as good. > SpamAssassin is typically represented as a magic dance of tweaking > rules. Are the default rule thresholds good values to start at? How > can I adequately decide which rules to tweak and how much to tweak > them by? In other words, how do you manage your adjustments without > users noticing wide spam classifying variations? I have not done any score tweaking at my site. I find that the default rules do very well. The only one that you might want to tweak is the BAYES_99 rule once your Bayes database is performing well. > Also, in regards to rules. What is the preferred method for update? > Official rule releases, rulesdujour, custom? All of the above? All of the above. Rules_du_jour is extremely useful for keeping the SARE rules up to date. I would suggest that you visit the SARE rule site www.rulesemporium.com and grab any of the rulesets there that make sense for you. They do a good job of describing the rulesets and several of them have different versions depending on your tolerance for false positives. Then configure rules_du_jour to keep them up to date for you. sa-update can keep you up to date on the official releases. And it is always possible to create your own rules if there is a specific spam that keeps getting by. > How have people faired with MySQL replication of the DB? I will need > this solution to present the same data for backup MX which is not > local to the primary MX. Haven't tried this. Everything is on one machine for me. > Thanks for any assistance and recommendations you can make. It is > probably impossible to make a balanced and unbiased comparison of SA > to DSPAM, but I can try I suppose. Difficult maybe, but not impossible. You are on the right road. The main thing is to take the time to tweak both of them so that they are running at their best when you make the comparison. -- Bowie