You can 'sa-learn --ham' from mail folders, which the email user already
read and culled for spam.

After I did that, my baysian filter got surprisingly accurate.

Chris Shaker
[EMAIL PROTECTED]


----- Original Message ----- From: "Dave Hills" <[EMAIL PROTECTED]>
To: <users@spamassassin.apache.org>
Sent: Saturday, January 08, 2005 2:06 PM
Subject: Re: A very long spam



I try to train as much HAM as I can but I don't think it's possible to train HAM/SPAM equally as 90% of incoming email is SPAM.


On Jan 8, 2005, at 1:55 PM, Fajar Priyanto wrote:

At 04:34 AM 1/9/2005 +0700, you wrote:
Hi all,
Greetings. I've just joined the list.

I've been using sa-learn with SA 2.64 and 3.0.2
One thing is bugging me though. Is it safe to teach SA on a very long spam
such as the stock report spam? Will it cause many False Positive?

Why would you think it would?

By trying to avoid training that message you're poisoning your bayes
database for false negatives.

Train spam as spam, train ham as ham. Let the statistics deal with the
overlap. By trying to avoid training "spamish" ham or "hamish" spam you're
just doing your training a big disservice by making it unrealistic.

Thanks Matt,
So talking statistically, does it mean I have to train SA about 'ham' as many
as 'spam'? Right now, I train SA mostly on spams.


Reply via email to