-----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Robert Menschel Sent: Friday, September 05, 2003 8:25 PM To: Phil N Cc: [EMAIL PROTECTED] Subject: Re: [SAtalk] FW: Feedback on how identified spam is being handled
> All spam is then kept to be used as part of our corpus. Our spam corpus > is nearing 20k messages -- we'll probably start deleting the oldest spam > shortly. Here's a newbie question for you: When you say "used as part of our corpus," what are you actually using it for? Are you using it to train the Bayesian filter system? If so, what is the easiest way to do this? Since I don't want the stuff appended by SA to be part of the email used to train Bayesian, I have to go through each message (I use PINE for this) and write out the ORIGINAL message to a separate file which I then use for training. Is there an easier way to do thia? If I had to worry about thousands of messages, that would represent a large chunk of my time, manually writing out each original message. A better way? William L. Polhemus, Jr. P.E. Polhemus Engineering Company Katy, Texas USA ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk