Re: Better spam filter for postfix

Steve Thu, 15 Jul 2010 17:17:50 -0700

-------- Original-Nachricht --------
> Datum: Fri, 16 Jul 2010 02:09:43 +0300
> Von: Henrik K <h...@hege.li>
> An: postfix-users@postfix.org
> Betreff: Re: Better spam filter for postfix


> On Thu, Jul 15, 2010 at 11:16:43PM +0200, Steve wrote:
> > > >
> > > > If you looking for something that is beyond just being better then I
> > > > recommend CRM114 or DSPAM or OSBF-Lua. If you insist in having the
> AV
> > > > included in the Anti-Spam tool then use something like DSPAM.
> > > 
> > > I'd consider those as "engines". You can run one or all of them if you
> > > really want. MailScanner, Amavisd-new, Mimedefang and even SA (as a
> > > framework) are some of the "glues" that might utilize them.
> > > 
> >
> > Well.... those so called "engines" can run on their own. They don't need
> > to be wrapped inside any of the "glues" you mention. Especially not when
> > those "glues" are memory hogs.
> 
> Can you be more specific? Maybe you are addressing SA memory usage, which
> might only matter on some cases. Servers have lots of memory these days,
> and
> good MTA checks might reduce scanning needs greatly.
> 
Yes. Servers have a lot of memory those days but not enough memory to waste it. 
My point is not only memory. My biggest problem with tools such as SA is that 
it is very slow compared to other solutions out there. I in general can say 
that I classify x messages per second with filter XYZ while I in general would 
say that SpamAssassin needs x seconds per message. All the test in the past I 
have done with SpamAssassin confirm that statement. And for me system resources 
are important. Be it memory, CPU cycles, throughput etc...


> > > Generally DSPAM etc require user interaction/learning.
> > >
> > So does CRM114 and OSBF-Lua. But you are wrong in thinking that they
> need
> > an insane amount of training/learning.
> 
> That's what I meant with "etc". I did use DSPAM exclusively for few months
> in the past, but for my personal use I saw no benefits from it.
> 
Okay.


> > > SA does not, since
> > > it's a framework of rules and plugins and can autolearn Bayes if you
> want
> > > to
> > > - or even do the same for DSPAM etc if you use them as SA plugins.
> Let's
> > > not
> > > forget that DSPAM etc also require a database backend,
> > >
> >
> > You are WRONG. DSPAM does NOT require a database backend. I don't know
> > where you have that from? DSPAM MIGHT use a database backend but can run
> > well without one (using the Hash driver).
> 
> So you don't consider the CSS Hash driver a "database backend"? It
> requires
> disk, memory and CPU to store and retrieve tokens. Whatever..
> 
Well... it has a structure but I would not consider it a database in the 
classical way. If the CSS file is a database then a XML file is a database too 
and I personally don't consider a XML file to be a database.


> > > which might require
> > > lots of memory and/or disk, so it's not exactly "free" either.
> Accuracy
> > > depends heavily on configuration of all the components and other
> voodoo.
> > >
> >
> > What? Voodoo? Yeah right. There is less voodoo in CRM114, OSBF-Lua and
> DSPAM then in SA. I explain a user the following:
> > * you get mail and if it is wrongly classified by the Anti-Spam filter
> then you correct it and the filter will learn.
> > * the wrong classification is done based on YOUR prior classification
> you have feed to the Anti-Spam filter.
> > * if you feed wrong data to the Anti-Spam filter then the filter will
> make errors.
> > * the more you correct the higher the accuracy gets and you need less
> and less to correct errors.
> > 
> > That's easy to understand.
> > 
> > 
> > IMHO it is easier to explain then telling the user:
> > * there is an army of rule writers out there that is writing rules for
> SA where THEY are telling what is spam and what is ham.
> > 
> > And if the user asks me: what rules are that?
> > Then I would need to say that there are a gazillion of rules that I can
> not explain in detail without taking much of his time to go throw all the
> rules one by one.
> > 
> > Anyway...
> 
> So you have made your point. You prefer (or are required) to have user in
> control.
> 
Yes. The big problem is that no solution out there is 100% accurate for all 
users. So the only way to make the user happy is to delegate the control to him.


> I guess you don't use ANY other methods (blacklists etc) than users own
> statistical input, since you might have to tell your users that "THEY"
> though your mail was spam?
> 
No. I use other methods. A lot of them. I even developed my own stuff based on 
research papers from Anti-Spam researchers/companies. My setup is made that way 
that I have made many defense rings around Postfix. Each ring has it's own 
techniques and the father the ring is from Postfix the less resources it uses. 
However... each domain owner and/or user has control over the rings. He/she can 
turn them on/off, depending on their needs. I preset which are on and which are 
off but at the end each one of them is controllable by the end-user (or domain 
owner, which precedes user rules). Some stuff however is not controllable by 
the end user or domain owner. Stuff like SPF checks and DKIM checks/signing. 
That can not be turned off.

I know this sounds very complicated but the problem is that when offer mail 
services to others then you can't impossibly make all of them happy with a 
simple setup. Each individual has his/her own viewpoint on how mail should work 
and and and... and soon or later you stop arguing and you implement what you 
think is good and you set this as default but you allow the owner of the domain 
or his/her users control whatever they think is ok for them.


> > For me the three mentioned products are all better then SA because they
> > have a smaller memory footprint then SA and are way faster then SA and
> > properly set up require less maintenance and are way more accurate then
> > SA.
> 
> Good for you. Naturally resource usage is lower, the less stuff you do.
> One
> has to balance needs against that.
> 
I perfectly understand that.


> But let's forget the accuracy bs, there are too many variables for such
> generic claims to be made. You can achieve "happy users" with pretty much
> any tool out there if used right.
> 
That is right (I mean the part with "happy users"). You however can not deny 
that some tools are known to be better then others. Just look for example at 
OSBF-Lua. That beast has won at TREC 2006 and has been number 1 at CEAS 2008 
Spam Filter Live Challenge.


> I'm in a happy position to be able to reject/quarantine spam for 1000+
> users
> without ever bothering them with it, and very rarely get any questions
> about
> mail. If I had to do it the ISP way, I might consider DSPAM, then again I
> see nothing against using SA (or any other tool out there).
> 
Per default I would not see anything against SA as well. I know setups that 
filter millions of mail per day with SA, without any issue. Their HW 
requirement is huge compared to other solutions but at the end it has to be 
okay for them and if HW requirements are high and this is not an issue for them 
then so be it.


> > And regarding the training: DSPAM and CRM114 offers features where you
> can
> > pre-learn so that your users are having from day one already a high
> > accuracy (generally above 95%) and if they re-classify the first bunch
> of
> > errors then their accuracy jumps easy over 98.x%/99.x%. In DSPAM that
> kind
> > of setup is accomplished with merged groups or classification groups or
> > shared groups. In CRM114 you can at run time allocate and merge as many
> > CSS files (one pre-trained should be enough) as you like
> 
> You make it sound like statistical filters are invincible against
> different
> mail flows and pure user stupidity.
> 
No, no. User stupidity is unbeatable. No machine learning can compensate user 
stupidity.


> > > There are no easy answers.
> > >
> >
> > And this is generally the field where Anti-Spam tools that do not depend
> > on pre-made rules are shining, because they are very adaptive.
> 
> Right, like SA for example only depends on "pre-made" rules and doesn't
> have
> any statistical or realtime capabilities..
> 
It has both. Still my main concern regarding SA is the usage of resources. If I 
setup SA on one system and test how fast it is, how much memory it uses and how 
accurate it is and then compare those metrics on the same hardware with one of 
the other mentioned tools then SA looks pretty bad.


> I think continuing this is pointless and a bit off-topic.
>
Yes. It is off-topic. This shall be my last response to you on this topic.

btw: SA is a good tool. I absolutely see a need for something like SA. I just 
spoke for me that in the last 10+ years since I do filtering SA always has 
showed to be one of the slower solutions for me. And I rather invest some time 
at the beginning in implementing a complex solution, than constantly babysit a 
filtering solution. I have no problem in keeping a filter solution up to date 
(blocklists/whitelists need that kind of attention) but for a content filter I 
have no time to fiddle around with >10'000 individual user configuration rules. 
So something like DSPAM is for me the better solution. It allows me greater 
control and allows me to quickly update and make changes without the need to 
update individual configuration user files or at least it allows me with a 
single command to update settings for all users at once.
-- 
GMX DSL: Internet-, Telefon- und Handy-Flat ab 19,99 EUR/mtl.  
Bis zu 150 EUR Startguthaben inklusive! http://portal.gmx.net/de/go/dsl

Re: Better spam filter for postfix

Reply via email to