On Fri, 16 Jan 2004 08:51:34 -0800, cube <[EMAIL PROTECTED]> writes:

> Does anyone have a good way of collecting ham for the bayesian
> filters.  I can collect spam quite easily, but mixed in with my ham
> is all kinds of spam.  (There is a buttload of spam with less hits
> than 1.)

I manually clean my inbox of any spam that gets through and put it
into a special junk.spam.missed folder. I can use that folder as
ham. I can also use my outgoing email folder as ham. (Since many of my
messages are replies, this means that I get body tokens from the
origional message for free.)

> I read everywhere that I should do this process manually to ensure
> the quality of ham; this process sucks.  I wrote a little browser
> gui to do this, but it is still taking a while to get 1000 hams.
> 
> Is this normal and I am just lazy?

Here's an idea: Have your gui program display email sorted based on SA
score. After manually classifying email near the extremes (highest and
lowest score), reprocess the entire spool through SA a second
time. Hopefully bayes will have adjusted more email scores up or
down. In a sense, you're not classifying email, SA is doing it. What
you're doing is catching errors. 

If you do this, can you submit the program into contrib? 

Scott


-------------------------------------------------------
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to