-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

PieterB writes:
>http://au.spamassassin.org/hacking.html lists how to submit
>mass-check results. I have a couple of questions:
>
>* The CORPUS_POLICY lists that you should use hand-verified spam/ham
>  tiles, but the CORPUS_SUBMIT lists that you should only check the
>  top 20 spam/ham messages. I'm pretty sure my corpus is quite good,
>  but I don't want to check every message by hand. Can anybody
>  elaborate on this policy?

You pretty much *need* to check every message by hand -- to a degree.
Otherwise SpamAssassin will be trained against unreliable data, which is
worse than no training at all.

However, the "degree" is what's key here -- "by hand" can mean scanning
over the list of From/Subject lines and occasionally clicking on one
or two to verify that they are spam (or ham).  That's not very
time-consuming in general.

>* I get about 4000 genuine spams per month and have a couple of
>  mailboxes that I'm sure of only contain ham-mail. I receive both
>  a lot of English and Dutch e-mails.
>
>* Are there any other contributors already submitting dutch/english
>  corpora results? 

Not that I know of...

>* Should the corpora be approx. 50% ham and 50% spam?

That's an ideal; don't worry about it too much, especially for
the nightly rule-QA stuff.

>* How many people submit their mass-check results? How many messages
>  are in their corpora?

Right now, we've suspended it due to some moving of the infrastructure
that supports it to apache.org.   But it should be back up *soon*.
Keep an eye on the  SpamAssassin-dev list for an announcement when
we restart.

>Regards,
>
>Pieter
>
>BTW: is there a estimated release date set for spamassassin 2.70?

not yet ;)

- --j.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFABZrcQTcbUG5Y7woRAj3WAJ0WHwyfn+aiJeLzkkHiSn/bbc6YzQCg3reC
EFOJnpvbkfFFVuc0L282ebI=
=yKIm
-----END PGP SIGNATURE-----



-------------------------------------------------------
This SF.net email is sponsored by: Perforce Software.
Perforce is the Fast Software Configuration Management System offering
advanced branching capabilities and atomic changes on 50+ platforms.
Free Eval! http://www.perforce.com/perforce/loadprog.html
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to