-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Robert
Menschel
Sent: Friday, September 05, 2003 8:25 PM
To: Phil N
Cc: [EMAIL PROTECTED]
Subject: Re: [SAtalk] FW: Feedback on how identified spam is being handled

> All spam is then kept to be used as part of our corpus. Our spam corpus
> is nearing 20k messages -- we'll probably start deleting the oldest spam
> shortly.

Here's a newbie question for you:

When you say "used as part of our corpus," what are you actually using it
for? Are you using it to train the Bayesian filter system?

If so, what is the easiest way to do this?

Since I don't want the stuff appended by SA to be part of the email used to
train Bayesian, I have to go through each message (I use PINE for this) and
write out the ORIGINAL message to a separate file which I then use for
training.

Is there an easier way to do thia? If I had to worry about thousands of
messages, that would represent a large chunk of my time, manually writing
out each original message.

A better way?

William L. Polhemus, Jr. P.E.
Polhemus Engineering Company
Katy, Texas USA 





-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to