On Fri, 2013-05-31 at 12:15 -0700, tony wrote:
> now I've given it some more thought it's a certain type of spam that gets
> through. I never get scams or marital aid spam, never. The spam mails that
> get through are consistently to do with hospitality, increased sales and
> business courses. Are there rules I should be using to combat this type of
> spam?
> 
You can train Bayes to recognise these messages as spam.

Everybody's spam stream is a bit different, so you could also analyse a
moderate collection of these messages for common phrases and write a set
of rules that (a) recognise types of phrase and (b) recognise and
penalise messages that contain combinations of them.

For instance, I get fairly high reliability by writing low scoring
(0.001) rules to recognise selling phrases and product names and then
using high scoring (5.0 or more) meta rules that combine them to
recognise sales spam. Here's a set to give you the idea: 

body  SALES_TERM (low cost|available now|.....)
score SALES_TERM 0.001

body  PRODUCT    (Nike|Dior|Tommy Hilfiger|......)
score PRODUCT    0.001

meta  SALES_SPAM (SALES_TERM && PRODUCT)
score SALES_SPAM 5.0

This approach ensures that other messages can contain phrases that are
used for selling or product names without being treated as spam, but any
message that contains at least one of each will be dinged as spam.

HTH

Martin




Reply via email to