> From: cube
> Sent: Friday, January 16, 2004 8:52 AM
>
> Does anyone have a good way of collecting ham for the bayesian
> filters. I
> can collect spam quite easily, but mixed in with my ham is all
> kinds of spam.
> (There is a buttload of spam with less hits than 1.)
>
> I read everywhere that I should do this process manually to ensure the
> quality of ham; this process sucks. I wrote a little browser gui to do
> this, but it is still taking a while to get 1000 hams.
>
> Is this normal and I am just lazy?
>
It is a pain, esp. on a big mailbox, and you need large sample, of say,
2000/so each of ham and spam to train the Bayes engine.
What I did is fired up 'mutt', and used its 'tag' capabilities to
tag the spam that I wanted to extract and deposit into my spam sample. It is
important to remember that this low-scoring spam
is exactly the stuff that will help Bayes do a better job.
Anyway, I'd first sort by sender's address, and then find the
obvious outliers who were spammers, tag those and then write/append
them to a spam mbox. I'd also sort by subject and rescan manually
for spam. It still took some time, but doing things this way
eased the pain.
-------------------------------------------------------
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk