Re: Best Practices: SpamAssassin

mouss Fri, 31 Mar 2006 15:40:12 -0800

Ryan Kather wrote:

I'll answer some parts...
Yes, from a purely testing perspective. I don't have the liberty of this since I am live production testing. I suppose I could move all received messages for all users through all filters and then only deliver to those users who have opted into the various test breakdowns. I'll look into this, as I think it would be a better picture of the accuracy then a subset of my users.


one thing you could do is also test a combination of SA+dspam:

- dspam per user (with or without initial training). no groups.
- SA (either with a site bayes or with no bayes)

- sa is used to train dspam if SA score is "sure" (<0 or >8, forexample). This would implement autolearn for dspam based on SA.- additionally, you can skip SA if dspam confidence is > 0.7 for example(or if the "user dictionary is mature").



My experience is that most spam is detected by both, but:
- sometimes, one of the filters help detect an FP

- sometimes one of the filters detects spam that the other doesn't.(dspam can't do uribl, ...).

Interesting. I hadn't though to choose delivery with LDAP instead of Transports. I suppose this would get rid of my multiple postix instance need. Good suggestion, I'll have to look into this.

No. ldap is a lookup method, not a postfix replacement. you needmultiple instances because you need different transport configurations,whether you use mysql, ldap, pgsql, hash, ... etc.

FILTER doesn't work because of multi-recipient mail (only one filter isused per message).

Trap accounts are great, but I always worry they get different spam then real accounts and pollute the Bayesian database. Has anyone experienced this? Also how does SpamAssassin deal with the Bayesian pollution attempts seen recently (spam emails with garbage in them).

I can't say and I too don't trust traps. but in my experiments, theyseem to improve accuracy for "immature" users. now spammers could attacktraps (by posting "ham text"), but I didn't see that yet.

I could pretty much trust a small subset of users to be fairly
regular in their training.  There is a somewhat larger portion of
They might be telling less trusty users how to take part in the trainingprocess, and then break-up your Bayes DB. Those less-smart users shouldbe managed with amaivsd-new LDAP profiling.
LDAP profiling.. Haven't seen examples of that yet. I will definitely research.

I personally think that no user should train anything but his own db.This is why I like the idea of a site-wide + per-user setup. thesite-wide db doesn't rely on people. I have tried implementing dspamgroups, but couldn't find a safe way.

The individual DBs sound painful, but I'm not sure how consistent our users are. I guess I will have to watch the Bayesian accuracy as it's built and make a decision later.

This is why I like having a combination of site-wide filter + per-userfilter. for "lazy" users, the site-wide is enough. for "active" users,the per-user filter will help them to increase filtering accuracy.

It seems as if most of the recommendations advise against trying to force feed your Bayesian. I suppose there's no shortcuts if you want it to be accurate.

the problem is that of symmetry. spam is sent "randomly", so you can usepublic corpuses (this is not completely true, but is an "acceptablehypothesis" an address that gets a lot of spam). ham, on the other hand,depends closely on the recipient. I use multiple addresses, and I oftenget the same spam to almost all of them, but with few exceptions, allham is different (This is one of the reasons I abandoned the idea ofdspam groups).

I am beginning to think I won't be able to select new rulesets until the system 
is online, and I have a present metric on it to go by.

you might want to grab the *0.cf SARE rules. except for very few ruleswhich generate FPs, they catch a lot of spam.

Re: Best Practices: SpamAssassin

Reply via email to