On Thu, 11 Dec 2003 09:10:29 -0500, Adam Denenberg <[EMAIL PROTECTED]> posted to spamassassin-talk: > What i want to start is a Bayes Corpus Project. I would like to be > able to allow people to submit confirmed ham and/or spam to a large > bayes corpus repository (or maybe just spam) where people could then > download (or somehow do an sa-learn remotely) to an ongoing updated > bayes corpus.
There are various efforts to collect representative email corpora for spam testing but none of them are very successful IMHO. The main problem, as others already pointed out, is to get a hold of good, representative ham email. Privacy issues and everything notwithstanding, I think it would be beneficial to collect +something+, on a regular basis, to test against. Whether somebody would actually use the material for Bayes training is then up to them. It's probably not worth it, but having the material to be able to prove it to yourself would be useful, too, as this tends to come up every once in a while. The plan to collect spam alone is hardly worth pursuing. I can get all the spam I want (and then some) just by opening my inbox. If that doesn't suffice, there are places like the NANAS newsgroup (look for (news.admin.net-abuse.sightings on an NNTP server near you) and various more or less semi-closed half-start efforts which duplicate it. If anything, making heads and/or tails of what people are actually submitting to NANAS would be a useful contribution -- I don't know of anybody who actually uses NANAS for anything real. Paul Judge started the spam archive project <http://spamarchive.org> roughly a year ago but results so far are less than startling, and they don't seem to be responding to email. (Ah, they have mailing lists now? Hadn't noticed.) /* era */ -- The email address era the contact information Just for kicks, imagine at iki dot fi is heavily link on my home page at what it's like to get spam filtered. If you <http://www.iki.fi/era/> 500 pieces of spam for want to reach me, see instead. each wanted message. ------------------------------------------------------- This SF.net email is sponsored by: IBM Linux Tutorials. Become an expert in LINUX or just sharpen your skills. Sign up for IBM's Free Linux Tutorials. Learn everything from the bash shell to sys admin. Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk