Hello Mike,

Monday, September 22, 2003, 1:47:50 PM, you wrote:

MK> In response to my complaint of weak rules out of box:

Mike, the value of Spamassassin is its ability to weight
based on a combination of factors, rather than simply delete
email based on single words or phrases. Words that seem
spammy to you may not be spam to everyone. For example, a
urologist or a pharmacist might send and receive legitimate
email containing words that would seem like spam to the rest
of us.

MOST spam not only has bad words in the text, but it has a
number of factors related to the headers that also help
weight the spam. For example, the fact that most spam is
sent out by bulk mail  -- I mean, how does Spamassassin tell
the difference between someone legitimately trying to
contact me with a business proposal (such as wanting to
place ads on my website or arrange to resell some of our
products) -- and the vast amount of spam that comes through
purporting to be a business proposal?

Spamassassin works VERY well if used in conjunction with RBL
checks and Bayes, because within a very short time it
becomes quite adept at spotting spam patterns.

Spamassassin also allows you to adjust your own scores - for
example, I have decided that for MY users (I manage an
in-house server for a company that sells educational
products and services) - there should NEVER be a valid
reason to receive email with certain obscene language, so I
added my own custom recipes to give higher scores to such
words, and also increased some of the Spamassassin scores.
I also used my experience with the legit mail we receive to
create negative scores for words and phrases that appear
frequently in mail from legitimate customers.

What you should do to test SA is send some real spam through
it - or simply wait and see how it does.

There will always be some spam that gets through.  What I do
when this happens is (a) feed the spam to my Bayes database,
and (b) if the headers show the actual domain that sent the
spam - and it is clearly not forged - ban the IP or domain
from ever sending email to my server again using my sendmail
access file.

Spamassassin blocks several hundred instances of spam daily,
and within the same time frame I see maybe 10 or so spams
leak through.  Bayes is incredible -- so you really should
use it and then check again with your invented spam once you
have an adequate corpus of learned spam and ham.  Once Bayes
is involved, you will see that it often is the factor that
pushes questionable email over the thresshold. Sometimes I
see Bayes merely identifying a 70% of probability rather
than the 99% that gives a higher score -- but basically, I
seldom see Bayes get it wrong.

-Abigail



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to