Re: Best Practices: SpamAssassin

Paolo Cravero as2594 Fri, 31 Mar 2006 06:26:36 -0800

Ryan Kather wrote:

I'll answer some parts...

Ideas: -------- Postfix- I would prefer to use SpamAssassin as a
store and forward mail filtering relay appliance.  It seems if I
place a Postfix Linux MTA in front of my existing spam solution I
could setup test groups.  100 users could be forwarded to the
SpamAssassin test box and passed internally to GroupWise.  100 users
could be forwarded to the DSPAM test box and passed internally to
GroupWise.  The rest of the users would be forwarded to the Symantec
Mail Security Gateway and passed internally to GroupWise (until such

Wouldn't it make more sense to pass the same message through each systemunder test?

I would prefer to use LDAP to validate recipients for SpamAssassin
and DSPAM which should be possible with Postfix.


Yup! LDAP forever!

I think I could accomplish this scenario with Postfix Transports,
though I may need to run multiple instances of Postfix.  Does anyone
see a flaw in this?

You should be able to lookup on LDAP a custom attribute that meansnext-hop hostname. You need some LDAP work, but very basic, and you're set!

possible to provide a fair performance picture versus SpamAssassin


Performance... are you hunting for speed or accuracy?
(perhaps you wrote it before and I missed it)

It appears many seem to be using the Amavsid-new + Postfix +
SpamAssassin configuration.  Is there a reason not to use this
design?  I have had good luck with this in the past.

This is a very good combination. Amavisd-new allows per-user (!) LDAPprofiling and SQL quarantine management.I'm running both Postfix+SA and postfix+amavis+SA+clamav+mailzu+LDAP ontwo different MX for different domains. Although the latter setuprequires more powerful hardware (not necessarily if your 4000 users havea steady traffic and won't grow), it is much more manageable.


Your review should take into account also these frills!

I also have read a lot where people are improving accuracy by
increasing the scoring of the Bayesian database (which needs

[...]

can I insure user false positives are easily reportable?  What do
others do to train the Bayesian database?  Maia-Mailguard?

After the initial setup, Bayes can live more or less its own life withbroad enough autolearn thresholds. We do not let users submit stuff fortraining (80kusers!) but rather submit meaningful samples occasionally.

We've also found that spammers are targeting common addresses such asinfo@, software@, john@, ... which were not used on some domains. So wetransformed those into spamtraps (with LDAP's mailAcceptingGeneralId ormailAlternateAddress is pretty straightforward!), manually review andfeed to an IMAP folder for autospamlearn. HAM learning is unfortunatelyunderestimated and more rarely done, out of our own HAM messages.

I could pretty much trust a small subset of users to be fairly
regular in their training.  There is a somewhat larger portion of

They might be telling less trusty users how to take part in the trainingprocess, and then break-up your Bayes DB. Those less-smart users shouldbe managed with amaivsd-new LDAP profiling.

use some kind of common database.  In the default configuration SA
uses one Bayesian database for all users.  Is there a reason to
change this?  What is the consensus on a shared ruleset versus
individual rulesets?

If your users share common-type messages, I'd go for a common Bayes DB.We do have a common one for all our domains (actually one for old andanother for new SA servers). Individual Bayes DBs get large and if theybreak you've got to troubleshoot each individually...

Shared rulesets, with custom rules for special cases (using SA __METArules) evaluated for each message.

It also seems that there is a falling out between pyzor, dcc, razor,
and the community.  Is it simply a licensing issue (with legal

Can't comment on this, but consider running a DNS cache if you plan touse SA's DNS tests!

What about an initial corpus to train the Bayesian database?  Will
this hurt my accuracy in the long term?  What corpuses are being
used?  Am I better off letting the Bayesian autolearn gradually
perform this function?

You don't keep your spam, do you? :-) Train the DB with your *own*(company's) spam and ham corpus. It will not hurt. Don't use publiccorpuses.

SpamAssassin is typically represented as a magic dance of tweaking
rules.  Are the default rule thresholds good values to start at?  How
can I adequately decide which rules to tweak and how much to tweak
them by?  In other words, how do you manage your adjustments without
users noticing wide spam classifying variations?

We do not adjust rules scoring. Not with SA 3.1, while we did it on SA2.6 Bayes scores. Since most of our traffic is non-English, this helpeda bit.


Default values are the most suitable for each rule.

Also, in regards to rules.  What is the preferred method for update?
Official rule releases, rulesdujour, custom?  All of the above?

Test them and decide which apply to your case. Dunno how indipendentyour current antispam solution is, with SA you need to invest some timeto review false negatives/positives (if any) and review extra rulesets.

How have people faired with MySQL replication of the DB?  I will need
this solution to present the same data for backup MX which is not
local to the primary MX.

First of all: we dropped the secondary MX record because it receivedmore spam than primary. We use a load balancer for HA.

What do you want to store on MySQL? Bayes, AWL, quarantine are yournon-mutually exclusive options.

Bayes and AWL can be regenerated in matter of minutes, and you can start(I mean "power up") a backup MX without them.Replicating quarantine is like replicating your trash between two bins.If you provide delegated quarantine, how likely is that a HW failurewill destroy a false positive? You're probably better off without MySQLmaster-slave replication hassle.

AFAIK there is a MySQL master-master replication function, but itslimitations make it incompatible with amavis SQL needs.



<OT MODE ON>
X-Mailer: Novell GroupWise Internet Agent 6.0.4

OMG! It formatted your message paragraphs without breaking-up lines!Luckily Thunderbird has a rewrap function!


<OT MODE OFF>

Have a nice weekend!
Paolo Cravero

--
|    QRPp-I #707  + www.paolocravero.tk +  I QRP #476   |
| SpamAssassin-based email antispam/antivirus solutions |
 \    Italian/English-to/from-Croatian translations    /
  \                   Skype: pcravero                 /

Re: Best Practices: SpamAssassin

Reply via email to