On 12 Dec 2003, [EMAIL PROTECTED] moaned:
> On Thu, 11 Dec 2003 09:10:29 -0500, Adam Denenberg <[EMAIL PROTECTED]>
> posted to spamassassin-talk:
>  >  What i want to start is a Bayes Corpus Project.  I would like to be
>  > able to allow people to submit confirmed ham and/or spam to a large
>  > bayes corpus repository (or maybe just spam)  where people could then
>  > download (or somehow do an sa-learn remotely) to an ongoing updated
>  > bayes corpus.
> 
> There are various efforts to collect representative email corpora for
> spam testing but none of them are very successful IMHO.
> 
> The main problem, as others already pointed out, is to get a hold of
> good, representative ham email. Privacy issues and everything
> notwithstanding, I think it would be beneficial to collect
> +something+, on a regular basis, to test against.

Nah, what's really needed is a tool that merges Bayes DBs together. That
way someone could learn from a pile of ham and hand the DBs to people
for them to merge into their databases.

That should be a lot less confidential than the raw emails, because
the ordering over tokens has been lost :)

The only problem then would be that some of the spammy tokens (the
header ones in particular) might never hit at any other site: but in
that case, expiry will zap them soon enough.


(If you're paranoid, you could make sure that you don't have
confidential single tokens in there: bank account numbers and
important --- i.e., non-Mailman --- passwords).

-- 
`...some suburbanite DSL customer who thinks kernel patches are some
 form of military insignia.' --- Bob Apthorpe


-------------------------------------------------------
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to