On Thu, 2013-11-28 at 19:33 -0200, Sergio Durigan Junior wrote: > Having said that, my SA is still missing lots of spams. For example, > take a look at: > > <http://sergiodj.net/~sergio/sa/spam.txt> > This doesn't score much better here (2.9) but this is a type of spam I don't see. However, there are two suspicious features (to me anyway):
1) its apparently from a gmail address with a gmail message-ID yet was sent direct, i.e. hasn't been through a gmail MTA, so the sender address and message ID are almost certainly forged. No URBLs fires because there are no headers for them to trigger on. If I was getting much of this, I'd probably write a local rule that would fire if both sender && msg-ID are gmail but there are no gmail Received headers. 2) I've never seen legitimate mail with a *.html filename where the Content-Type was NOT text/html so I'd probably write a local rule for that too. > <http://sergiodj.net/~sergio/sa/spam2.txt> > > It's a classical spam, I think. The score is even higher than the first > spam. But it's still not catching it. > This does score high here (18.0) because its obvious phishing spam and hits my local anti-phishing rules. I can't say exactly why because these rules have been built over time and have a large collection of trigger phrases. I use my "portmanteau rule" assembly tool for defining this type of rule, which have many alternate patterns in them, because it makes their creation and maintenance much easier. See: http://www.libelle-systems.com/free/ and you'll find the portmanteau tool toward the end of the page in the Spamassassin section. Unlike Bayes, portmanteau rules don't need to build history before they can catch spam. So, they *may* work better with types of spam that contains a few characteristic phrases where each phrase comes from a large pool of possibilities. The same goes for sales spam. But, as always, ymmv. Martin