On Fri, 16 Jan 2004 08:51:34 -0800, cube <[EMAIL PROTECTED]> writes: > Does anyone have a good way of collecting ham for the bayesian > filters. I can collect spam quite easily, but mixed in with my ham > is all kinds of spam. (There is a buttload of spam with less hits > than 1.)
I manually clean my inbox of any spam that gets through and put it into a special junk.spam.missed folder. I can use that folder as ham. I can also use my outgoing email folder as ham. (Since many of my messages are replies, this means that I get body tokens from the origional message for free.) > I read everywhere that I should do this process manually to ensure > the quality of ham; this process sucks. I wrote a little browser > gui to do this, but it is still taking a while to get 1000 hams. > > Is this normal and I am just lazy? Here's an idea: Have your gui program display email sorted based on SA score. After manually classifying email near the extremes (highest and lowest score), reprocess the entire spool through SA a second time. Hopefully bayes will have adjusted more email scores up or down. In a sense, you're not classifying email, SA is doing it. What you're doing is catching errors. If you do this, can you submit the program into contrib? Scott ------------------------------------------------------- The SF.Net email is sponsored by EclipseCon 2004 Premiere Conference on Open Tools Development and Integration See the breadth of Eclipse activity. February 3-5 in Anaheim, CA. http://www.eclipsecon.org/osdn _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk