Re: SA not correctly classifying spam

Martin Gregorie Thu, 28 Nov 2013 14:35:47 -0800

On Thu, 2013-11-28 at 19:33 -0200, Sergio Durigan Junior wrote:

> Having said that, my SA is still missing lots of spams.  For example,
> take a look at:
> 
>   <http://sergiodj.net/~sergio/sa/spam.txt>
> 
This doesn't score much better here (2.9) but this is a type of spam I
don't see. However, there are two suspicious features (to me anyway):


1) its apparently from a gmail address with a gmail message-ID yet was
sent direct, i.e. hasn't been through a gmail MTA, so the sender address
and message ID are almost certainly forged. No URBLs fires because there
are no headers for them to trigger on. If I was getting much of this,
I'd probably write a local rule that would fire if both sender && msg-ID
are gmail but there are no gmail Received headers.

2) I've never seen legitimate mail with a *.html filename where the
Content-Type was NOT text/html so I'd probably write a local rule for
that too.

>   <http://sergiodj.net/~sergio/sa/spam2.txt>
> 
> It's a classical spam, I think.  The score is even higher than the first
> spam.  But it's still not catching it.
> 
This does score high here (18.0) because its obvious phishing spam and
hits my local anti-phishing rules. I can't say exactly why because these
rules have been built over time and have a large collection of trigger
phrases. I use my "portmanteau rule" assembly tool for defining this
type of rule, which have many alternate patterns in them, because it
makes their creation and maintenance much easier. See:

http://www.libelle-systems.com/free/

and you'll find the portmanteau tool toward the end of the page in the
Spamassassin section. 

Unlike Bayes, portmanteau rules don't need to build history before they
can catch spam. So, they *may* work better with types of spam that
contains a few characteristic phrases where each phrase comes from a
large pool of possibilities. The same goes for sales spam. 

But, as always, ymmv.
 

Martin

Re: SA not correctly classifying spam

Reply via email to