On Thu, 11 Dec 2003 09:10:29 -0500, Adam Denenberg <[EMAIL PROTECTED]>
posted to spamassassin-talk:
 >  What i want to start is a Bayes Corpus Project.  I would like to be
 > able to allow people to submit confirmed ham and/or spam to a large
 > bayes corpus repository (or maybe just spam)  where people could then
 > download (or somehow do an sa-learn remotely) to an ongoing updated
 > bayes corpus.

There are various efforts to collect representative email corpora for
spam testing but none of them are very successful IMHO.

The main problem, as others already pointed out, is to get a hold of
good, representative ham email. Privacy issues and everything
notwithstanding, I think it would be beneficial to collect
+something+, on a regular basis, to test against.

Whether somebody would actually use the material for Bayes training is
then up to them. It's probably not worth it, but having the material
to be able to prove it to yourself would be useful, too, as this tends
to come up every once in a while.

The plan to collect spam alone is hardly worth pursuing. I can get all
the spam I want (and then some) just by opening my inbox. If that
doesn't suffice, there are places like the NANAS newsgroup (look for
(news.admin.net-abuse.sightings on an NNTP server near you) and
various more or less semi-closed half-start efforts which duplicate
it. If anything, making heads and/or tails of what people are actually
submitting to NANAS would be a useful contribution -- I don't know of
anybody who actually uses NANAS for anything real.

Paul Judge started the spam archive project <http://spamarchive.org>
roughly a year ago but results so far are less than startling, and
they don't seem to be responding to email. (Ah, they have mailing
lists now? Hadn't noticed.)

/* era */

-- 
The email address era     the contact information   Just for kicks, imagine
at iki dot fi is heavily  link on my home page at   what it's like to get
spam filtered.  If you    <http://www.iki.fi/era/>  500 pieces of spam for
want to reach me, see     instead.                  each wanted message.



-------------------------------------------------------
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to